MCA-Bench：评估CAPTCHA抗VLM攻击能力的多模态基准

摘要

随着自动化攻击技术的迅猛发展，CAPTCHA（全自动区分计算机和人类的公开图灵测试）依然是抵御恶意机器人的关键防线。然而，现有的CAPTCHA方案涵盖了多种模式——从静态扭曲文本和模糊图像，到交互式点击、滑动拼图及基于逻辑的问题——但学术界仍缺乏一个统一、大规模、多模态的基准来严格评估其安全鲁棒性。为填补这一空白，我们推出了MCA-Bench，这是一个全面且可复现的基准测试套件，它将异构的CAPTCHA类型整合到单一评估协议中。借助共享的视觉-语言模型骨干，我们为每类CAPTCHA微调专门的破解代理，从而实现跨模态的一致性评估。大量实验表明，MCA-Bench有效映射了现代CAPTCHA设计在不同攻击场景下的脆弱性谱系，并首次定量分析了挑战复杂度、交互深度与模型可解性之间的相互关系。基于这些发现，我们提出了三项可操作的设计原则，并识别出关键开放挑战，为系统性的CAPTCHA加固、公平基准测试及更广泛的社区合作奠定了基础。数据集与代码已在线公开。

English

As automated attack techniques rapidly advance, CAPTCHAs remain a critical defense mechanism against malicious bots. However, existing CAPTCHA schemes encompass a diverse range of modalities -- from static distorted text and obfuscated images to interactive clicks, sliding puzzles, and logic-based questions -- yet the community still lacks a unified, large-scale, multimodal benchmark to rigorously evaluate their security robustness. To address this gap, we introduce MCA-Bench, a comprehensive and reproducible benchmarking suite that integrates heterogeneous CAPTCHA types into a single evaluation protocol. Leveraging a shared vision-language model backbone, we fine-tune specialized cracking agents for each CAPTCHA category, enabling consistent, cross-modal assessments. Extensive experiments reveal that MCA-Bench effectively maps the vulnerability spectrum of modern CAPTCHA designs under varied attack settings, and crucially offers the first quantitative analysis of how challenge complexity, interaction depth, and model solvability interrelate. Based on these findings, we propose three actionable design principles and identify key open challenges, laying the groundwork for systematic CAPTCHA hardening, fair benchmarking, and broader community collaboration. Datasets and code are available online.