LegalHalluLens：面向可信赖法律人工智能的类型化幻觉审计与校准的多智能体辩论

摘要

部署于法律工作流中的AI系统会出现幻觉，聚合指标报告显示其幻觉率约为52%，但这一平均值掩盖了错误集中出现的领域及其方向性偏差，导致合规官员无法获得可供可信部署的可行信号。我们提出LegalHalluLens审计框架，该框架包含三个组成部分：基于CUAD数据集（Hendrycks等，2021）中四类法律动机性主张（数值型、时间型、义务/权利型、事实型）构建的类型化幻觉配置文件；将遗漏偏向与虚构偏向简化为单一可跨部署比较标量的风险方向指数（RDI）；以及一个经量级和方向双重校准的类型化辩论管线。通过对510份合同及249,252条条款级实例的测量，我们发现聚合报告所掩盖的同一模型内义务/权利类与时间类主张之间的差距约为38-40个百分点，并揭示两个具有相同52%幻觉率的系统可能呈现相反的RDI值。该辩论管线将虚假检测减少45%，各类别改进与诊断结果相匹配，且能以显著更小的骨干网络（4B活跃参数）达到商业API同等性能。类型化配置文件和RDI能够揭示聚合指标所掩盖的失效模式；我们进一步证明，这些诊断结果可作为多智能体辩论管线的校准输入，其中针对已测量失效模式设计的怀疑者挑战和非对称门控机制，其性能优于通用调优的辩论系统。该框架可为野外部署的法律AI提供方向感知的采购、问责及智能体设计支持。

English

AI systems deployed in legal workflows hallucinate at rates that aggregate metrics report at ~52%, but this average conceals where errors concentrate and in which direction they run, leaving compliance officers without an actionable signal for trustworthy deployment. We present LegalHalluLens, an auditing framework with three components: typed hallucination profiles across four legally-motivated claim categories (numeric, temporal, obligation/entitlement, factual) over CUAD (Hendrycks et al., 2021); a Risk Direction Index (RDI) that reduces omission-versus-invention bias to a single deployment-comparable scalar; and a typed debate pipeline calibrated to both magnitudes and directions. Across 510 contracts and 249,252 clause-level instances we measure a within-model gap of approximately 38-40 pp between obligation/numeric and temporal claims that aggregate reporting hides, and show that two systems with matched 52% rates can carry opposite RDIs. The debate pipeline reduces fabricated detections by 45% with per-category gains tracking the diagnosis, matching commercial APIs with a substantially smaller backbone (4B active parameters). Typed profiles and RDI surface failure modes that aggregate metrics hide; we further show these diagnostics serve as calibration inputs for multi-agent debate pipelines, where Skeptic challenges and asymmetric gates targeted at measured failure modes outperform generically-tuned debate. The framework supports direction-aware procurement, accountability, and agent design for legal AI deployed in the wild.