AUDITFLOW：面向结构化财务报告验证的可执行符号环境

摘要

结构化财务审计验证对语言模型智能体来说具有挑战性，因为其正确性依赖于结构化的证据而非纯文本。模型必须将报告的事实与分类概念关联起来，遍历计算或维度关系，并在应用审计规则之前重新计算预期值。我们提出AuditFlow，一种基于图的多智能体框架，将自适应搜索与确定性验证分离。AuditFlow从静态的美国通用会计准则（US-GAAP）分类图与动态的XBRL申报图中构建符号环境，并通过类型化工具提供事实检索、分类遍历、数值检查和规则评估等功能。两名初级审计员分别从监管和证据角度审查每个案例，而高级审计员则解决分歧并可根据需要要求进一步调查。最终报告通过证据聚合进行融合，生成审计结论、预期值、证据链和可信度评分。在基于FinAuditing构建的FinMR样本上，AuditFlow在GPT-5.5下达到82.09%的联合审计准确率，超出最强基线14.93个百分点。去除确定性检查后准确率降至17.91%，这表明符号环境执行了模型无法可靠替代的验证步骤。

English

Structured financial audit verification is difficult for language-model agents because correctness depends on structured evidence rather than text alone. A model must link reported facts to taxonomy concepts, traverse calculation or dimensional relations, and recompute expected values before applying an audit rule. We propose AuditFlow, a graph-grounded multi-agent framework that separates adaptive search from deterministic verification. AuditFlow builds a symbolic environment from a static US-GAAP taxonomy graph and a dynamic XBRL filing graph, and exposes it through typed tools for fact retrieval, taxonomy traversal, numerical checking, and rule evaluation. Two junior auditors inspect each case from regulatory and evidentiary views, while a senior auditor resolves disagreements and can request further investigation. The final reports are fused through evidential aggregation to produce an audit verdict, expected value, evidence trail, and trustworthiness score. On a FinAuditing-derived FinMR sample, AuditFlow reaches 82.09% joint audit accuracy under GPT-5.5, outperforming the strongest baseline by 14.93 points. Removing deterministic checks drops accuracy to 17.91%, showing that the symbolic environment performs the verification step that the model cannot reliably replace.