16th USENIX Security Symposium – Abstract
Pp. 43–54 of the Proceedings
Language Identification of Encrypted VoIP Traffic: Alejandra y Roberto or Alice and Bob?
Charles V. Wright, Lucas Ballard, Fabian Monrose, and Gerald M. Masson, Johns Hopkins University
Voice over IP (VoIP) has become a popular protocol for making phone
calls over the Internet. Due to the potential transit of sensitive
conversations over untrusted network infrastructure, it is well
understood that the contents of a VoIP session should be encrypted.
However, we demonstrate that current cryptographic techniques do not
provide adequate protection when the underlying audio is encoded
using bandwidth-saving Variable Bit Rate (VBR) coders. Explicitly,
we use the length of encrypted VoIP packets to tackle the
challenging task of identifying the language of the conversation.
Our empirical analysis of 2,066 native speakers of 21 different
languages shows that a substantial amount of information can be
discerned from encrypted VoIP traffic. For instance, our 21-way
classifier achieves 66% accuracy, almost a 14-fold improvement over
random guessing. For 14 of the 21 languages, the accuracy is greater than 90%.
We achieve an overall binary classification (e.g., "Is this a
Spanish or English conversation?") rate of 86.6%. Our analysis
highlights what we believe to be interesting new privacy issues in VoIP.
- View the full text of this paper in HTML and PDF. Listen to the presentation in MP3 format.
Until August 2008, you will need your USENIX membership identification in order to access the full papers.
The Proceedings are published as a collective work, © 2007 by the USENIX Association. All Rights Reserved. Rights to individual papers remain with the author or the author's employer. Permission is granted for the noncommercial reproduction of the complete work for educational or research purposes. USENIX acknowledges all trademarks within this paper.