Lanqing Yang, Xinqi Chen, Xiangyong Jian, Leping Yang, Yijie Li, Qianfei Ren, Yi-Chao Chen, and Guangtao Xue, Shanghai Jiao Tong University; Xiaoyu Ji, Zhejiang University
Speech recognition (SR) systems are used on smart phones and speakers to make inquiries, compose emails, and initiate phone calls. However, they also impose a serious security risk. Researchers have demonstrated that the introduction of certain sounds can threaten the security of SR systems. Nonetheless, most of those methods require that the attacker approach to within a short distance of the victim, thereby limiting the applicability of such schemes. Other researchers have attacked SR systems remotely using peripheral devices (e.g., lasers); however, those methods require line of sight access and an always-on speaker in the vicinity of the victim. To the best of our knowledge, this paper presents the first-ever scheme, named SingAttack, in which SR systems are manipulated by human-like sounds generated in the switching mode power supply of the victim’s device. The fact that attack signals are transmitted via the power grid enables long-range attacks on existing SR systems. The proposed SingAttack system does not rely on extraneous hardware or unrealistic assumptions pertaining to device access. In experiments on ten SR systems, SingAttack achieved Mel-Cepstral Distortion of 7.8 from an attack initiated at a distance of 23m.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.