You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion

Roei Schuster; Congzheng Song; Eran Tromer; Vitaly Shmatikov

Roei Schuster, Tel Aviv University and Cornell Tech; Congzheng Song, Cornell University; Eran Tromer, Tel Aviv University and Columbia University; Vitaly Shmatikov, Cornell Tech

Distinguished Paper Award Winner

Code autocompletion is an integral feature of modern code editors and IDEs. The latest generation of autocompleters uses neural language models, trained on public open-source code repositories, to suggest likely (not just statically feasible) completions given the current context.

We demonstrate that neural code autocompleters are vulnerable to poisoning attacks. By adding a few specially-crafted files to the autocompleter's training corpus (data poisoning), or else by directly fine-tuning the autocompleter on these files (model poisoning), the attacker can influence its suggestions for attacker-chosen contexts. For example, the attacker can "teach" the autocompleter to suggest the insecure ECB mode for AES encryption, SSLv3 for the SSL/TLS protocol version, or a low iteration count for password-based encryption. Moreover, we show that these attacks can be targeted: an autocompleter poisoned by a targeted attack is much more likely to suggest the insecure completion for files from a specific repo or specific developer.

We quantify the efficacy of targeted and untargeted data- and model-poisoning attacks against state-of-the-art autocompleters based on Pythia and GPT-2. We then evaluate existing defenses against poisoning attacks, and show that they are largely ineffective.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {263874,
author = {Roei Schuster and Congzheng Song and Eran Tromer and Vitaly Shmatikov},
title = {You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion},
booktitle = {30th USENIX Security Symposium (USENIX Security 21)},
year = {2021},
isbn = {978-1-939133-24-3},
pages = {1559--1575},
url = {https://www.usenix.org/conference/usenixsecurity21/presentation/schuster},
publisher = {USENIX Association},
month = aug
}

Download

Schuster PDF

Schuster Paper (Prepublication) PDF

View the slides

You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion

Open Access Media

Presentation Video