Mind the Inconspicuous: Revealing the Hidden Weakness in Aligned {LLMs}' Refusal Boundaries

Jiahao Yu; Haozheng Luo; Jerry Yao-Chieh Hu; Yan Chen; Wenbo Guo; Han Liu; Xinyu Xing

Jiahao Yu, Haozheng Luo, Jerry Yao-Chieh Hu, and Yan Chen, Northwestern University; Wenbo Guo, University of California, Santa Barbara; Han Liu and Xinyu Xing, Northwestern University

Recent advances in Large Language Models (LLMs) have led to impressive alignment—where models learn to distinguish harmful from harmless queries through supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF). In this paper, we reveal a subtle yet impactful weakness in these aligned models. We find that simply appending multiple end-of-sequence (eos) tokens can cause a phenomenon we call "context segmentation", which effectively shifts both "harmful" and "benign" inputs closer to the refusal boundary in the hidden space.

Building on this observation, we propose a straightforward method to BOOST jailbreak attacks by appending eos tokens. Our systematic evaluation shows that this strategy significantly increases the attack success rate across 8 representative jailbreak techniques and 16 open-source LLMs, ranging from 2B to 72B parameters. Moreover, we develop a novel probing mechanism for commercial APIs and discover that major providers—such as OpenAI, Anthropic, and Qwen—do not filter eos tokens, making them similarly vulnerable. These findings highlight a hidden yet critical blind spot in existing alignment and content filtering approaches.

We call for heightened attention to eos tokens' unintended influence on model behaviors, particularly in production systems. Our work not only calls for an input-filtering based defense, but also points to new defenses that make refusal boundaries more robust and generalizable, as well as fundamental alignment techniques that can defend against context segmentation attacks.

Category:

Long Presentation

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {309770,
author = {Jiahao Yu and Haozheng Luo and Jerry Yao-Chieh Hu and Yan Chen and Wenbo Guo and Han Liu and Xinyu Xing},
title = {Mind the Inconspicuous: Revealing the Hidden Weakness in Aligned {LLMs}{\textquoteright} Refusal Boundaries},
booktitle = {34th USENIX Security Symposium (USENIX Security 25)},
year = {2025},
isbn = {978-1-939133-52-6},
address = {Seattle, WA},
pages = {259--278},
url = {https://www.usenix.org/conference/usenixsecurity25/presentation/yu-jiahao},
publisher = {USENIX Association},
month = aug
}

Download

Yu PDF

Mind the Inconspicuous: Revealing the Hidden Weakness in Aligned LLMs' Refusal Boundaries

Open Access Media