基于策略推理轨迹的语言模型政策合规性评估扩展
Scaling Policy Compliance Assessment in Language Models with Policy Reasoning Traces
September 27, 2025
作者: Joseph Marvin Imperial, Harish Tayyar Madabushi
cs.AI
摘要
政策合规性评估是一项基础任务,旨在判断输入案例是否严格遵循一系列人为定义的规则,这些规则更普遍地被称为政策。实践中,人类专家遵循系统化、逐步的过程来识别与政策中具体规定相违背的情况。然而,获取这种代表黄金标准、专家级推理过程的记录成本高昂。本文引入了政策推理轨迹(Policy Reasoning Traces, PRT),这是一种专门生成的推理链,作为推理桥梁,旨在提升大型语言模型(LLM)在政策合规性评估方面的能力。我们的实证评估表明,无论是在推理阶段还是训练阶段使用PRT,均显著提升了开源权重模型和商业模型的性能,为HIPAA和GDPR政策设立了新的技术标杆。除了准确率的提升,我们还强调了PRT如何增强LLM准确引用政策条款的能力,以及通过其从原始思维链中的高利用率来影响合规决策。
English
Policy compliance assessment is a fundamental task of evaluating whether an
input case strictly complies with a set of human-defined rules, more generally
known as policies. In practice, human experts follow a systematic, step-by-step
process to identify violations with respect to specific stipulations outlined
in the policy. However, such documentation of gold-standard, expert-level
reasoning processes is costly to acquire. In this paper, we introduce Policy
Reasoning Traces (PRT), a form of specialized generated reasoning chains that
serve as a reasoning bridge to improve an LLM's policy compliance assessment
capabilities. Our empirical evaluations demonstrate that the use of PRTs for
both inference-time and training-time scenarios significantly enhances the
performance of open-weight and commercial models, setting a new
state-of-the-art for HIPAA and GDPR policies. Beyond accuracy gains, we also
highlight how PRTs can improve an LLM's ability to accurately cite policy
clauses, as well as influence compliance decisions through their high
utilization from the raw chains of thought.