Learning Normality is Enough: A Software-based Mitigation against Inaudible Voice Attacks

Xinfeng Li; Xiaoyu Ji; Chen Yan; Chaohao Li; Yichen Li; Zhenning Zhang; Wenyuan Xu

Authors:

Xinfeng Li, Xiaoyu Ji, and Chen Yan, USSLAB, Zhejiang University; Chaohao Li, USSLAB, Zhejiang University and Hangzhou Hikvision Digital Technology Co., Ltd.; Yichen Li, Hong Kong University of Science and Technology; Zhenning Zhang, University of Illinois at Urbana-Champaign; Wenyuan Xu, USSLAB, Zhejiang University

Abstract:

Inaudible voice attacks silently inject malicious voice commands into voice assistants to manipulate voice-controlled devices such as smart speakers. To alleviate such threats for both existing and future devices, this paper proposes NormDetect, a software-based mitigation that can be instantly applied to a wide range of devices without requiring any hardware modification. To overcome the challenge that the attack patterns vary between devices, we design a universal detection model that does not rely on audio features or samples derived from specific devices. Unlike existing studies’ supervised learning approach, we adopt unsupervised learning inspired by anomaly detection. Though the patterns of inaudible voice attacks are diverse, we find that benign audios share similar patterns in the time-frequency domain. Therefore, we can detect the attacks (the anomaly) by learning the patterns of benign audios (the normality). NormDetect maps spectrum features to a low-dimensional space, performs similarity queries, and replaces them with the standard feature embeddings for spectrum reconstruction. This results in a more significant reconstruction error for attacks than normality. Evaluation based on the 383,320 test samples we collected from 24 smart devices shows an average AUC of 99.48% and EER of 2.23%, suggesting the effectiveness of NormDetect in detecting inaudible voice attacks.

Xinfeng Li, USSLAB, Zhejiang University

Xiaoyu Ji, USSLAB, Zhejiang University

Chen Yan, USSLAB, Zhejiang University

Chaohao Li, USSLAB, Zhejiang University and Hangzhou Hikvision Digital Technology Co., Ltd.

Yichen Li, Hong Kong University of Science and Technology

Zhenning Zhang, University of Illinois at Urbana-Champaign

Wenyuan Xu, USSLAB, Zhejiang University

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {285423,
author = {Xinfeng Li and Xiaoyu Ji and Chen Yan and Chaohao Li and Yichen Li and Zhenning Zhang and Wenyuan Xu},
title = {Learning Normality is Enough: A Software-based Mitigation against Inaudible Voice Attacks},
booktitle = {32nd USENIX Security Symposium (USENIX Security 23)},
year = {2023},
isbn = {978-1-939133-37-3},
address = {Anaheim, CA},
pages = {2455--2472},
url = {https://www.usenix.org/conference/usenixsecurity23/presentation/li-xinfeng},
publisher = {USENIX Association},
month = aug
}

Download

Li PDF

Li Paper (Prepublication) PDF

View the slides