Meng Chen, Zhejiang University; Xiangyu Xu, Southeast University; Li Lu, Zhongjie Ba, Feng Lin, and Kui Ren, Zhejiang University
Recent years have witnessed deep learning techniques endowing modern audio systems with powerful capabilities. However, the latest studies have revealed its strong reliance on training data, raising serious threats from backdoor attacks. Different from most existing works that study audio backdoors in the digital world, we investigate the mismatch between the trigger and backdoor in the physical space by examining sound channel distortion. Inspired by this observation, this paper proposes TrojanRoom to bridge the gap between digital and physical audio backdoor attacks. TrojanRoom utilizes the room impulse response (RIR) as a physical trigger to enable injection-free backdoor activation. By synthesizing dynamic RIRs and poisoning a source class of samples during data augmentation, TrojanRoom enables any adversary to launch an effective and stealthy attack using the specific impulse response in a room. The evaluation shows over 92% and 97% attack success rates on both state-of-the-art speech command recognition and speaker recognition systems with negligible impact on benign accuracy below 3% at a distance of over 5m. The experiments also demonstrate that TrojanRoom could bypass human inspection and voice liveness detection, as well as resist trigger disruption and backdoor defense.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.