Shuai Cheng, Shu Meng, Haitao Xu, and Haoran Zhang, The State Key Laboratory of Blockchain and Data Security, Zhejiang University; Shuai Hao, Old Dominion University; Chuan Yue, Colorado School of Mines; Wenrui Ma, Zhejiang Gongshang University; Meng Han and Fan Zhang, Zhejiang University; Zhao Li, Zhejiang University and Hangzhou Yugu Technology
Large Language Models (LLMs) exhibit strong natural language processing capabilities but also pose significant privacy risks, particularly regarding the leakage of Personally Identifiable Information (PII) embedded in their training data. Existing PII extraction methods suffer from the limitations of low success rates or impracticality for large-scale PII extraction. In this study, we propose a novel PII extraction approach based on enhanced few-shot learning techniques, which achieves efficient and cost-effective PII retrieval without relying on fine-tuning or jailbreaking. We evaluated our approach on both open-source and closed-source LLMs. The experimental results demonstrate that, for non-targeted PII extraction, the attack success rate reaches 48.9%, extracting one authentic PII per two queries at a cost of $0.012 per PII. For targeted PII extraction, our approach surpassed state-of-the-art methods, achieving a 10% to 60% improvement in attack success rates. Additionally, an exploratory analysis of the origins of extracted PII revealed the significant scale of potential privacy breaches. Our work advances the understanding of LLM-induced privacy risks and underscores the vulnerability of partial personal data to large-scale exploitation.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Shuai Cheng and Shu Meng and Haitao Xu and Haoran Zhang and Shuai Hao and Chuan Yue and Wenrui Ma and Meng Han and Fan Zhang and Zhao Li},
title = {Effective {PII} Extraction from {LLMs} through Augmented {Few-Shot} Learning},
booktitle = {34th USENIX Security Symposium (USENIX Security 25)},
year = {2025},
isbn = {978-1-939133-52-6},
address = {Seattle, WA},
pages = {8155--8173},
url = {https://www.usenix.org/conference/usenixsecurity25/presentation/cheng-shuai},
publisher = {USENIX Association},
month = aug
}
