基於政策推理軌跡的語言模型政策合規性評估擴展
Scaling Policy Compliance Assessment in Language Models with Policy Reasoning Traces
September 27, 2025
作者: Joseph Marvin Imperial, Harish Tayyar Madabushi
cs.AI
摘要
政策合規性評估是一項基礎性任務,旨在評估輸入案例是否嚴格遵循一系列人為定義的規則,這些規則更廣泛地被稱為政策。在實踐中,人類專家遵循系統化、逐步的過程來識別違反政策中具體規定的行為。然而,獲取此類黃金標準、專家級推理過程的文檔成本高昂。本文介紹了政策推理軌跡(Policy Reasoning Traces, PRT),這是一種專門生成的推理鏈,作為推理橋樑,以提升大型語言模型(LLM)的政策合規性評估能力。我們的實證評估表明,無論是在推理階段還是訓練階段使用PRT,均能顯著提升開源權重模型和商業模型的性能,為HIPAA和GDPR政策設定了新的技術前沿。除了準確性的提升,我們還強調了PRT如何增強LLM準確引用政策條款的能力,以及通過其高利用率從原始思維鏈中影響合規決策。
English
Policy compliance assessment is a fundamental task of evaluating whether an
input case strictly complies with a set of human-defined rules, more generally
known as policies. In practice, human experts follow a systematic, step-by-step
process to identify violations with respect to specific stipulations outlined
in the policy. However, such documentation of gold-standard, expert-level
reasoning processes is costly to acquire. In this paper, we introduce Policy
Reasoning Traces (PRT), a form of specialized generated reasoning chains that
serve as a reasoning bridge to improve an LLM's policy compliance assessment
capabilities. Our empirical evaluations demonstrate that the use of PRTs for
both inference-time and training-time scenarios significantly enhances the
performance of open-weight and commercial models, setting a new
state-of-the-art for HIPAA and GDPR policies. Beyond accuracy gains, we also
highlight how PRTs can improve an LLM's ability to accurately cite policy
clauses, as well as influence compliance decisions through their high
utilization from the raw chains of thought.