DAR: 에이전트적 통제를 통한 의무 추론

초록

규범 추론은 명시적 규칙과 정책을 특정 사례의 사실에 적용하여 질문에 답하는 작업으로, 예를 들어 법률 조항에 따른 세금 부채 계산이나 출입국 항소 결과 결정이 해당된다. LLM 기반 규범 추론의 주요 기술적 과제는 관련 규칙 집합이 길고 상호 참조될 수 있어, 모델이 특정 추론 단계에 필요한 규칙을 여전히 찾지 못할 수 있다는 점이다. 본 논문에서는 모델이 필요 시 법령과 상호작용하는 에이전트 기반 추론 설정인 Deontic Agentic Reasoning(DAR)을 소개한다. 우리는 DeonticBench의 어려운 하위 집합에 대해 여러 하네스에서 DAR을 평가한다. 이러한 설정 전반에 걸쳐, 에이전트 기반 하네스가 규범 추론 작업의 최첨단을 끌어올릴 수 있지만, 개선 효과가 균일하지는 않음을 발견했다. 즉, 약한 모델은 훨씬 더 많은 토큰을 소비하면서 수치 작업에서는 종종 성능이 저하된다.

English

Deontic reasoning is the task of answering questions by applying explicit rules and policies to case-specific facts, for example computing tax liability under a statute or determining the outcome of an immigration appeal. A key technical challenge for LLM-based deontic reasoning is that the relevant ruleset can be long and cross-referenced, so models may still fail to locate the rules needed for a particular reasoning step. We introduce Deontic Agentic Reasoning (DAR), an agentic reasoning setup in which the model interacts with the statutes on demand. We evaluate DAR under multiple harnesses on hard subsets of DeonticBench. Across these settings, we find that agentic harnesses can push the frontier on deontic reasoning tasks, but improvements are not uniform: weaker models often degrade on numerical tasks while consuming far more tokens.