UAB - The University of Alabama at Birmingham

Wiretapping via Mimicry

Crypto Phones are mobile apps claiming to offer an end-to-end VoIP security guarantees based on a purely peer-to-peer, user- centric mechanism. To secure the voice, video or even text communications, Crypto Phones require a cryptographic key, which is agreed between the end parties using a special- purpose key exchange protocol. This protocol results in a short (e.g., 16-bit or 2-word) checksum, called a Short Authenticated String (SAS), per party. These strings are then output, e.g., encoded into numbers or words, to users’ devices who then verbally exchange and compare each other’s SAS values, and accordingly accept, or reject the secure association attempt (i.e., detect the presence of Man-in-the-Middle – MITM – attack). The following figure presents a simple Crypto Phone protocol.

Figure1: Crypto Phone protocol (simplified)

Figure1: Crypto Phone protocol (simplified)

 

The security of Cfones crucially relies on the assumption that the human voice channel, over which SAS values are communicated and validated by the users, provides the properties of integrity and source authentication. In this work, we challenge this assumption, and report on automated SAS voice imitation man-in-the-middle attacks that can compromise the security of Crypto Phones in both two-party and multi-party settings, even if users pay due diligence. The first attack, called the short voice reordering attack, builds arbitrary SAS strings in a victim’s voice by reordering previously eavesdropped SAS strings spoken by the victim. The second attack, called the short voice morphing attack, builds arbitrary SAS strings in a victim’s voice from a few previously eavesdropped sentences (less than 3 minutes) spoken by the victim. The following figure demonstrates the attack.

Figure 2: Our short voice imitation MITM attack scenario for 2-Cfone – attack succeeds because of voice impersonation

Figure 2: Our short voice imitation MITM attack scenario for 2-Cfone – attack succeeds because of voice impersonation

 

We design and implement our attacks using off-the-shelf speech recognition/synthesis tools, and comprehensively evaluate them with respect to both manual detection (via a user study with 30 participants) and automated detection. The results demonstrate the effectiveness of our attacks against three prominent forms of SAS encodings: numbers, PGP word lists and Madlib sentences. These attacks can be used by a wiretapper to compromise the confidentiality and privacy of Crypto Phones voice, video and text communications (plus authenticity in case of text conversations).

People

Faculty

Student

Publication