FinVault：基于执行环境的金融智能体安全基准测试

摘要

基于大语言模型的金融智能体正日益广泛应用于投资分析、风险评估及自动化决策领域。这些智能体具备规划能力、工具调用能力及可变状态操控能力，在高压且强监管的金融环境中引发了新的安全风险。然而现有安全评估主要聚焦于语言模型层面的内容合规性或抽象智能体设定，未能捕捉真实操作流程和状态变更行为所产生的执行层面风险。为弥补这一空白，我们提出首个面向金融智能体的执行安全基准测试框架FinVault，该框架包含31个基于监管案例的沙箱场景（配备可写状态数据库与明确合规约束）、107个现实漏洞及963个测试用例，系统覆盖提示注入、越狱攻击、金融场景适配攻击以及用于误报评估的良性输入。实验结果表明，现有防御机制在真实金融智能体环境中依然存在不足：最先进模型的平均攻击成功率仍高达50.0%，即便对于最稳健的系统（攻击成功率6.7%）风险仍不可忽视，这凸显出现有安全方案的可迁移性有限，亟需构建更强的金融场景专属防御体系。代码已发布于https://github.com/aifinlab/FinVault。

English

Financial agents powered by large language models (LLMs) are increasingly deployed for investment analysis, risk assessment, and automated decision-making, where their abilities to plan, invoke tools, and manipulate mutable state introduce new security risks in high-stakes and highly regulated financial environments. However, existing safety evaluations largely focus on language-model-level content compliance or abstract agent settings, failing to capture execution-grounded risks arising from real operational workflows and state-changing actions. To bridge this gap, we propose FinVault, the first execution-grounded security benchmark for financial agents, comprising 31 regulatory case-driven sandbox scenarios with state-writable databases and explicit compliance constraints, together with 107 real-world vulnerabilities and 963 test cases that systematically cover prompt injection, jailbreaking, financially adapted attacks, as well as benign inputs for false-positive evaluation. Experimental results reveal that existing defense mechanisms remain ineffective in realistic financial agent settings, with average attack success rates (ASR) still reaching up to 50.0\% on state-of-the-art models and remaining non-negligible even for the most robust systems (ASR 6.7\%), highlighting the limited transferability of current safety designs and the need for stronger financial-specific defenses. Our code can be found at https://github.com/aifinlab/FinVault.