Exploiting {Task-Level} Vulnerabilities: An Automatic Jailbreak Attack and Defense Benchmarking for {LLMs}

Lan Zhang; Xinben Gao; Liuyi Yao; Jinke Song; Yaliang Li

Lan Zhang and Xinben Gao, University of Science and Technology of China; Liuyi Yao, unaffiliated; Jinke Song, The Hong Kong University of Science and Technology; Yaliang Li

Recent advancements in large language models (LLMs) have notably improved their proficiency in executing complex tasks. However, these advancements are accompanied by an increased risk of generating toxic content as well as leaking private information. Jailbreak is an emerging trend to amplify this vulnerability by carefully modifying prompts such asDAN" to circumvent the LLMs' defense. Notwithstanding, existing jailbreaks typically focus on specific prompts or tokens, rendering them susceptible to countermeasures such as realignments. In contrast to these prompt-level or token-level jailbreaks, we present a novel task-level jailbreak based on knowledge decomposition, which does not rely on any specific prompts or tokens. Our attack demonstrates significantly enhanced resistance against realignments compared to previous jailbreak techniques. Furthermore, our attack not only achieves about 10% higher success rates than SOTA attacks but also generates responses that are richer in detail and information. This is attributed to aggregation of responses from multiple well-designed queries rather than relying on only a singular query as in previous attacks, thus signifying an elevated risk of threat. On the other hand, knowledge decomposition provide us a method to generate plenty tasks with varying risk levels, thereby establishing a novel benchmark to assess the defensive effectiveness of LLMs.

Category:

Long Presentation

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {309632,
author = {Lan Zhang and Xinben Gao and Liuyi Yao and Jinke Song and Yaliang Li},
title = {Exploiting {Task-Level} Vulnerabilities: An Automatic Jailbreak Attack and Defense Benchmarking for {LLMs}},
booktitle = {34th USENIX Security Symposium (USENIX Security 25)},
year = {2025},
isbn = {978-1-939133-52-6},
address = {Seattle, WA},
pages = {2363--2382},
url = {https://www.usenix.org/conference/usenixsecurity25/presentation/zhang-lan},
publisher = {USENIX Association},
month = aug
}

Download

Zhang PDF

Exploiting Task-Level Vulnerabilities: An Automatic Jailbreak Attack and Defense Benchmarking for LLMs

Open Access Media