Void: A fast and light voice liveness detection system

Muhammad Ejaz Ahmed; Il-Youp Kwak; Jun Ho Huh; Iljoo Kim; Taekkyung Oh; Hyoungshick Kim

Muhammad Ejaz Ahmed, Data61, CSIRO; Il-Youp Kwak, Chung-Ang University; Jun Ho Huh and Iljoo Kim, Samsung Research; Taekkyung Oh, KAIST and Sungkyunkwan University; Hyoungshick Kim, Sungkyunkwan University

Due to the open nature of voice assistants' input channels, adversaries could easily record people's use of voice commands, and replay them to spoof voice assistants. To mitigate such spoofing attacks, we present a highly efficient voice liveness detection solution called "Void." Void detects voice spoofing attacks using the differences in spectral power between live-human voices and voices replayed through speakers. In contrast to existing approaches that use multiple deep learning models, and thousands of features, Void uses a single classification model with just 97 features.

We used two datasets to evaluate its performance: (1) 255,173 voice samples generated with 120 participants, 15 playback devices and 12 recording devices, and (2) 18,030 publicly available voice samples generated with 42 participants, 26 playback devices and 25 recording devices. Void achieves equal error rate of 0.3% and 11.6% in detecting voice replay attacks for each dataset, respectively. Compared to a state of the art, deep learning-based solution that achieves 7.4% error rate in that public dataset, Void uses 153 times less memory and is about 8 times faster in detection. When combined with a Gaussian Mixture Model that uses Mel-frequency cepstral coefficients (MFCC) as classification features—MFCC is already being extracted and used as the main feature in speech recognition services—Void achieves 8.7% error rate on the public dataset. Moreover, Void is resilient against hidden voice command, inaudible voice command, voice synthesis, equalization manipulation attacks, and combining replay attacks with live-human voices achieving about 99.7%, 100%, 90.2%, 86.3%, and 98.2% detection rates for those attacks, respectively.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {247630,
author = {Muhammad Ejaz Ahmed and Il-Youp Kwak and Jun Ho Huh and Iljoo Kim and Taekkyung Oh and Hyoungshick Kim},
title = {Void: A fast and light voice liveness detection system},
booktitle = {29th USENIX Security Symposium (USENIX Security 20)},
year = {2020},
isbn = {978-1-939133-17-5},
pages = {2685--2702},
url = {https://www.usenix.org/conference/usenixsecurity20/presentation/ahmed-muhammad},
publisher = {USENIX Association},
month = aug
}

Download

Ahmed PDF

Ahmed (Prepublication) PDF

View the slides

Void: A fast and light voice liveness detection system

Open Access Media

Presentation Video