Are {CAPTCHAs} Still Bot-hard? Generalized Visual {CAPTCHA} Solving with Agentic Vision Language Model

Xiwen Teoh; Yun Lin; Siqi Li; Ruofan Liu; Avi Sollomoni; Yaniv Harel; Jin Song Dong

Xiwen Teoh, Shanghai Jiao Tong University; National University of Singapore; Yun Lin, Shanghai Jiao Tong University; Siqi Li and Ruofan Liu, National University of Singapore; Avi Sollomoni and Yaniv Harel, Tel Aviv University; Jin Song Dong, National University of Singapore

Visual CAPTCHAs, such as reCAPTCHA v2, hCaptcha, and GeeTest, are mainstream security mechanisms to deter bots online, based on the assumption that their visual challenges are bot-hard but human-friendly. While many deep-learning based solvers have been designed and trained to solve a specific type of visual challenge in a CAPTCHA, vendors can easily switch to out-of-distribution visual challenge of the same type or even new types of challenge with very low cost. However, the emergence of general-purpose AI models (e.g., ChatGPT) challenges the bot-hard assumption of existing visual challenges, potentially compromising the reliability of visual CAPTCHAs.

In this work, we report the first generalized visual CAPTCHA solver, Halligan, built upon the state-of-the-art vision language model (VLM), which can effectively solve unseen visual challenges in CAPTCHAs without making any adaptation. Our rationale lies in that a visual challenge can be reduced to a search problem where (i) its instruction is transformed into an optimization objective and (ii) its body is transformed into a search space for the objective. With well designed prompts built upon known VLMs, the transformation can be generalized to unseen visual challenges. Our extensive experiments show that Halligan is a game-changer to the known practice of adopting visual CAPTCHAs, which achieves a solving rate of 60.7% on 2,600 challenges belonging to 26 types of visual CAPTCHAs. Further, we use Halligan to infiltrate human-driven CAPTCHA farms, achieving an average solving rate of 70.6% on previously unseen visual challenges from CAPTCHAs in the wild over a 30-day period. Based on the experimental results, we further shed light on puzzle-less anti-bot alternatives in this era.

Category:

Short Presentation

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {309468,
author = {Xiwen Teoh and Yun Lin and Siqi Li and Ruofan Liu and Avi Sollomoni and Yaniv Harel and Jin Song Dong},
title = {Are {CAPTCHAs} Still Bot-hard? Generalized Visual {CAPTCHA} Solving with Agentic Vision Language Model},
booktitle = {34th USENIX Security Symposium (USENIX Security 25)},
year = {2025},
isbn = {978-1-939133-52-6},
address = {Seattle, WA},
pages = {3747--3766},
url = {https://www.usenix.org/conference/usenixsecurity25/presentation/teoh},
publisher = {USENIX Association},
month = aug
}

Download

Teoh PDF

Teoh Appendix PDF

Are CAPTCHAs Still Bot-hard? Generalized Visual CAPTCHA Solving with Agentic Vision Language Model

Open Access Media