UAB - The University of Alabama at Birmingham

CCCP: Closed Caption Crypto Phones to Resist MITM Attacks, Human Errors and Click-Through

End to end encrypted voice and messaging apps (Crypto Phones) such as Signal, WhatsApp, and Silent circle aim to establish end-to-end secure voice (and text) communications based on human-centric code validation. In order to secure the voice, video or even text communications, Crypto Phones require a cryptographic key. This protocol produces a usually short code (e.g., 16-bit or 2-word), called a Short Authenticated String (SAS), per each communicating party, with the characteristic that if an MITM attacker attempts to interfere with the protocol, the checksums will not match.

To ensure that the MITM attacker does not interfere with the protocol messages and compromise the protocol security (over the data/voice channel), Crypto Phones rely upon the end users to compare the code. Some of these apps also require the users to verify the voice of the speaker who reads the code to detect sophisticated voice-based man-in-the-middle (voice MITM) attacks. Our research shows that both tasks (comparing the code and verifying the speaker) are prone to human errors making Crypto Phones highly vulnerable to MITM attacks. Based on our findings we designed a mechanism to automate the code verification task and thereby reduce the errors stemming from the manual code verification.

We introduce Closed Captioning Crypto Phones (CCCP), that remove the human user from the loop of checksum comparison by utilizing speech transcription. CCCP simply requires the user to announce the checksum to the other party— the system automatically transcribes the spoken checksum and performs the comparison. The following figure shows the basic idea of our CCCP model based on transcription. For more detail about the traditional Crypto Phone design please refer to our Wiretapping via Mimicry and Crypto Phones Security and Usability projects.

Automating checksum comparisons offers many key advantages over traditional designs: (1) the chances of data MITM due to human errors and “click-through” could be highly reduced (even eliminated); (2) longer checksums can be utilized, which increases the protocol security against data MITM; (3) users’ cognitive burden is reduced due to the need to perform only a single task, thereby lowering the potential of human errors. As a main component of CCCP, we first design and implement an automated checksum comparison tool based on standard Speech to Text engines.

To evaluate the security and usability benefits of CCCP, we then design and conduct an online user study that mimics a realistic VoIP scenario, and collect and transcribe a comprehensive data set spoken by a wide variety of speakers in real-life conditions. Our study results demonstrate that, by using our automated checksum comparison, CCCP can completely resist data MITM, while significantly reducing human errors in the benign case compared to the traditional approach. They also show that CCCP may help reduce the likelihood of voice MITM. Finally, we discuss how CCCP can be further improved by designing specialized transcribers and carefully selected checksum dictionaries, and how it can be integrated with existing Crypto Phones to bolster their security and usability.