Hidden Trigger Backdoor Attack on {NLP} Models via Linguistic Style Manipulation

Xudong Pan; Mi Zhang; Beina Sheng; Jiaming Zhu; Min Yang

Xudong Pan, Mi Zhang, Beina Sheng, Jiaming Zhu, and Min Yang, Fudan University

The vulnerability of deep neural networks (DNN) to backdoor (trojan) attacks is extensively studied for the image domain. In a backdoor attack, a DNN is modified to exhibit expected behaviors under attacker-specified inputs (i.e., triggers). Exploring the backdoor vulnerability of DNN in natural language processing (NLP), recent studies are limited to using specially added words/phrases as the trigger pattern (i.e., word-based triggers), which distorts the semantics of the base sentence, causes perceivable abnormality in linguistic features and can be eliminated by potential defensive techniques.

In this paper, we present LiMnguistic Style-Motivated backdoor attack (LISM), the first hidden trigger backdoor attack which exploits implicit linguistic styles for backdooring NLP models. Besides the basic requirements on attack success rate and normal model performance, LISM realizes the following advanced design goals compared with previous word-based backdoor: (a) LISM weaponizes text style transfer models to learn to generate sentences with an attacker-specified linguistic style (i.e., trigger style), which largely preserves the malicious semantics of the base sentence and reveals almost no abnormality exploitable by detection algorithms. (b) Each base sentence is dynamically paraphrased to hold the trigger style, which has almost no dependence on common words or phrases and therefore evades existing defenses which exploit the strong correlation between trigger words and misclassification. Extensive evaluation on 5 popular model architectures, 3 real-world security-critical tasks, 3 trigger styles and 3 potential countermeasures strongly validates the effectiveness and the stealthiness of LISM.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {281342,
author = {Xudong Pan and Mi Zhang and Beina Sheng and Jiaming Zhu and Min Yang},
title = {Hidden Trigger Backdoor Attack on {NLP} Models via Linguistic Style Manipulation},
booktitle = {31st USENIX Security Symposium (USENIX Security 22)},
year = {2022},
isbn = {978-1-939133-31-1},
address = {Boston, MA},
pages = {3611--3628},
url = {https://www.usenix.org/conference/usenixsecurity22/presentation/pan-hidden},
publisher = {USENIX Association},
month = aug
}

Download

Pan PDF

View the slides

Hidden Trigger Backdoor Attack on NLP Models via Linguistic Style Manipulation

Open Access Media

Presentation Video