A Distortion-minimization Watermarking Framework for Large Language Models: Larger Capacity, Stronger Robustness and Higher Quality

Liming Zhai, Xuezhou Shang, Liyun Zhang, and Po Hu, Central China Normal University

Large language model (LLM) watermarking provides verifiable source identification for generated text, and its practical deployment requires large watermark capacity, strong robustness against attacks, and high text quality. However, existing methods often struggle to balance all these criteria, typically addressing them with separate designs. To overcome this, we propose a distortion-minimization watermarking (DMW) framework that unifies capacity, robustness and quality within a single optimization paradigm. This framework models robustness and quality as distortion costs for text modifications, minimizing the total distortion for a given watermark length to achieve an optimal trade-off. Specifically, we design several distortion costs: a robustness cost leveraging semantic invariance to resist attacks, and two quality costs guiding modifications toward low-cohesion, high-variability regions to reduce perceptual impact. We then propose periodically optimized syndrome-trellis codes (PO-STCs), formulating overall distortion minimization as a periodic shortest-path problem. This enables real-time optimization for sequential generation with flexible capacity control. Extensive experiments across diverse datasets and LLMs demonstrate DMW's superiority, outperforming state-of-the-art methods across all criteria. Notably, under severe paraphrasing attacks, DMW achieves a match rate up to 46.35% higher than the best baseline, while maintaining superior text quality.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.