{Cross-Modal} Prompt Inversion: Unifying Threats to Text and Image Generative {AI} Models

Dayong Ye; Tianqing Zhu; Feng He; Bo Liu; Minhui Xue; Wanlei Zhou

Dayong Ye and Tianqing Zhu, City University of Macau; Feng He and Bo Liu, University of Technology Sydney; Minhui Xue, CSIRO's Data61; Wanlei Zhou, City University of Macau

Generative models, including both text-to-text and text-to-image modalities, have underscored the significance of 'prompt engineering', a technique critical for enhancing the quality of model outputs. Crafting high-quality prompts is not only time-intensive but also economically valuable, making them prime targets for manipulation. Recent research has revealed that these prompts can be stolen through a technique known as prompt inversion, which reconstructs prompts merely by analyzing the outputs of models. However, existing studies are typically confined to either text-to-text or text-to-image models and are not cross-applicable, thus limiting their real-world utility. This gap raises a crucial question: Is there a unified approach capable of addressing both model types? In this paper, we present the first comprehensive study on a unified prompt inversion approach that targets both text and image models. Our approach involves two model-agnostic phases: (1) training an inversion model to generate initial prompt approximations from model outputs, and (2) using reinforcement learning to fine-tune the inversion model for enhanced accuracy. We further extend our investigation to the text-to-video modality to demonstrate the broad generalizability of our approach. Experimental results highlight our approach's superior performance in comparison to existing state-of-the-art methods, which are typically optimized for a single model type. The source code is available at: https://zenodo.org/records/15603408.

Category:

Short Presentation

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {309620,
author = {Dayong Ye and Tianqing Zhu and Feng He and Bo Liu and Minhui Xue and Wanlei Zhou},
title = {{Cross-Modal} Prompt Inversion: Unifying Threats to Text and Image Generative {AI} Models},
booktitle = {34th USENIX Security Symposium (USENIX Security 25)},
year = {2025},
isbn = {978-1-939133-52-6},
address = {Seattle, WA},
pages = {2303--2322},
url = {https://www.usenix.org/conference/usenixsecurity25/presentation/ye-inversion},
publisher = {USENIX Association},
month = aug
}

Download

Ye PDF

Cross-Modal Prompt Inversion: Unifying Threats to Text and Image Generative AI Models

Open Access Media