DAR: エージェント的ハーネスによる義務論的推論

要旨

義務推論とは、特定の事例の事実に対して明示的なルールやポリシーを適用することで質問に答えるタスクであり、例えば税法に基づく税額の計算や、移民控訴の結果の判断などが該当する。LLMを用いた義務推論における主要な技術的課題は、関連するルールセットが長大かつ相互参照的であるため、特定の推論ステップに必要なルールをモデルが適切に特定できない可能性があることである。本稿では、モデルがオンデマンドで法令と対話するエージェンティック推論設定である、Deontic Agentic Reasoning（DAR）を提案する。我々は、DeonticBenchの困難なサブセットに対して、複数のハーネスを用いてDARを評価する。これらの設定において、エージェンティックハーネスが義務推論タスクのフロンティアを押し広げる可能性がある一方、その改善は一様ではないことが判明した。すなわち、弱いモデルは数値タスクにおいて性能が低下することが多く、その際に大幅に多くのトークンを消費する。

English

Deontic reasoning is the task of answering questions by applying explicit rules and policies to case-specific facts, for example computing tax liability under a statute or determining the outcome of an immigration appeal. A key technical challenge for LLM-based deontic reasoning is that the relevant ruleset can be long and cross-referenced, so models may still fail to locate the rules needed for a particular reasoning step. We introduce Deontic Agentic Reasoning (DAR), an agentic reasoning setup in which the model interacts with the statutes on demand. We evaluate DAR under multiple harnesses on hard subsets of DeonticBench. Across these settings, we find that agentic harnesses can push the frontier on deontic reasoning tasks, but improvements are not uniform: weaker models often degrade on numerical tasks while consuming far more tokens.