言語モデルにおけるポリシー準拠評価のスケーリング：ポリシー推論トレースを用いたアプローチ

要旨

ポリシー準拠評価は、入力事例が人間によって定義された一連のルール、より一般的にはポリシーとして知られるものに厳密に準拠しているかどうかを評価する基本的なタスクである。実際には、人間の専門家は体系的なステップバイステップのプロセスに従い、ポリシーに記載された特定の規定に対する違反を特定する。しかし、このようなゴールドスタンダードかつ専門家レベルの推論プロセスの文書化は、取得に多大なコストがかかる。本論文では、ポリシー準拠評価能力を向上させるための推論の橋渡しとして機能する、特殊化された生成推論チェーンである「ポリシー推論トレース（PRT）」を紹介する。我々の実証評価では、推論時および学習時のシナリオにおいてPRTを使用することで、オープンウェイトモデルおよび商用モデルの性能が大幅に向上し、HIPAAおよびGDPRポリシーにおいて新たな最先端の性能を達成することが示された。精度の向上に加えて、PRTがLLMのポリシー条項の正確な引用能力を向上させ、生の思考チェーンからの高い利用率を通じて準拠判断に影響を与える方法についても強調する。

English

Policy compliance assessment is a fundamental task of evaluating whether an input case strictly complies with a set of human-defined rules, more generally known as policies. In practice, human experts follow a systematic, step-by-step process to identify violations with respect to specific stipulations outlined in the policy. However, such documentation of gold-standard, expert-level reasoning processes is costly to acquire. In this paper, we introduce Policy Reasoning Traces (PRT), a form of specialized generated reasoning chains that serve as a reasoning bridge to improve an LLM's policy compliance assessment capabilities. Our empirical evaluations demonstrate that the use of PRTs for both inference-time and training-time scenarios significantly enhances the performance of open-weight and commercial models, setting a new state-of-the-art for HIPAA and GDPR policies. Beyond accuracy gains, we also highlight how PRTs can improve an LLM's ability to accurately cite policy clauses, as well as influence compliance decisions through their high utilization from the raw chains of thought.

言語モデルにおけるポリシー準拠評価のスケーリング：ポリシー推論トレースを用いたアプローチ

Scaling Policy Compliance Assessment in Language Models with Policy Reasoning Traces

要旨

Support