Poisoning the Unlabeled Dataset of Semi-Supervised Learning


Nicholas Carlini, Google

Distinguished Paper Award Winner and Second Prize winner of the 2021 Internet Defense Prize


Semi-supervised machine learning models learn from a (small) set of labeled training examples, and a (large) set of unlabeled training examples. State-of-the-art models can reach within a few percentage points of fully-supervised training, while requiring 100x less labeled data.

We study a new class of vulnerabilities: poisoning attacks that modify the unlabeled dataset. In order to be useful, un-labeled datasets are given strictly less review than labeled datasets, and adversaries can therefore poison them easily. By inserting maliciously-crafted unlabeled examples totaling just 0.1% of the dataset size, we can manipulate a model trained on this poisoned dataset to misclassify arbitrary examples at test time (as any desired label). Our attacks are highly effective across datasets and semi-supervised learning methods.

We find that more accurate methods (thus more likely to be used) are significantly more vulnerable to poisoning attacks, and as such better training methods are unlikely to prevent this attack. To counter this we explore the space of defenses, and propose two methods that mitigate our attack.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

@inproceedings {274598,
author = {Nicholas Carlini},
title = {Poisoning the Unlabeled Dataset of {Semi-Supervised} Learning},
booktitle = {30th USENIX Security Symposium (USENIX Security 21)},
year = {2021},
isbn = {978-1-939133-24-3},
pages = {1577--1592},
url = {https://www.usenix.org/conference/usenixsecurity21/presentation/carlini-poisoning},
publisher = {USENIX Association},
month = aug,

Presentation Video