Backdooring Bias ({{{{{{{B^2}}}}}}}) into Stable Diffusion Models

Ali Naseh; Jaechul Roh; Eugene Bagdasarian; Amir Houmansadr

Ali Naseh, Jaechul Roh, Eugene Bagdasarian, and Amir Houmansadr, University of Massachusetts Amherst

Recent advances in large text-conditional diffusion models have revolutionized image generation by enabling users to create realistic, high-quality images from textual prompts, significantly enhancing artistic creation and visual communication. However, these advancements also introduce an underexplored attack opportunity: the possibility of inducing biases by an adversary into the generated images for malicious intentions, e.g., to influence public opinion and spread propaganda. In this paper, we study an attack vector that allows an adversary to inject arbitrary bias into a target model. The attack leverages low-cost backdooring techniques using a targeted set of natural textual triggers embedded within a small number of malicious data samples produced with public generative models. An adversary could pick common sequences of words that can then be inadvertently activated by benign users during inference. We investigate the feasibility and challenges of such attacks, demonstrating how modern generative models have made this adversarial process both easier and more adaptable. On the other hand, we explore various aspects of the detectability of such attacks and demonstrate that the model's utility remains intact in the absence of the triggers. Our extensive experiments using over 200,000 generated images and against hundreds of fine-tuned models demonstrate the feasibility of the presented backdoor attack. We illustrate how these biases maintain strong text-image alignment, highlighting the challenges in detecting biased images without knowing that bias in advance. Our cost analysis confirms the low financial barrier ($10-$15) to executing such attacks, underscoring the need for robust defensive strategies against such vulnerabilities in diffusion models.

Category:

Short Presentation

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {309800,
author = {Ali Naseh and Jaechul Roh and Eugene Bagdasarian and Amir Houmansadr},
title = {Backdooring Bias ({{{{{{{B^2}}}}}}}) into Stable Diffusion Models},
booktitle = {34th USENIX Security Symposium (USENIX Security 25)},
year = {2025},
isbn = {978-1-939133-52-6},
address = {Seattle, WA},
pages = {977--996},
url = {https://www.usenix.org/conference/usenixsecurity25/presentation/naseh},
publisher = {USENIX Association},
month = aug
}

Download

Naseh PDF

Backdooring Bias (B^2) into Stable Diffusion Models

Open Access Media