LedgerAgent: 정책 준수 도구 호출 에이전트를 위한 구조화된 상태

초록

정책 준수 도구 호출 에이전트는 고객 서비스 도메인에서 도구를 호출하는 과정에서 턴 간 작업 상태를 유지하고 도메인 정책을 준수해야 한다. 작업 상태는 사용자 상호작용 및 도구 호출을 통해 관찰된 관련 사실, 식별자, 제약 조건 및 조건으로 구성된다. 표준 에이전트에서는 작업 상태가 별도로 표현되지 않는다. 관찰 결과, 도구 반환값, 정책 지침이 프롬프트에 배치되어, 에이전트는 다음 행동을 결정할 때마다 프롬프트로부터 관련 상태를 재구성해야 한다. 이러한 설계는 상태 관리를 암시적으로 만들어 두 가지 일반적인 실패 모드를 유발한다. 에이전트가 올바른 사실을 검색했더라도 이후 의사 결정이 오래되거나 누락되거나 부정확한 정보에 근거할 수 있으며, 구문적으로 유효한 도구 호출이 현재 작업 상태에 의존하는 도메인 정책을 위반할 수도 있다. 본 논문에서는 관찰된 작업 상태를 별도의 원장(ledger)에 유지하고 이를 프롬프트에 렌더링하는 도구 호출 에이전트를 위한 추론 시 방법인 LedgerAgent를 소개한다. 또한 이 원장은 환경을 변경하는 도구 호출이 실행되기 전에 상태 의존적 정책 제약 조건을 확인하여 정책 위반을 차단하는 데 사용된다. 네 가지 고객 서비스 도메인과 오픈웨이트 및 클로즈드웨이트 모델의 혼합 패널에 걸쳐, LedgerAgent는 표준 프롬프트 기반 도구 호출 방식 대비 평균 passk를 개선하며, 더 엄격한 다중 시행 일관성 지표에서 가장 큰 향상을 보인다.

English

Policy-adherent tool-calling agents in customer-service domains must maintain task states across turns while calling tools and obeying domain policies. Task states consist of relevant facts, identifiers, constraints, and conditions observed through user interaction and tool calls. In standard agents, task states are not represented separately. Observations, tool returns, and policy instructions are placed in the prompt, leaving agents to reconstruct the relevant states from the prompt each time they decide what to do next. This design makes state management implicit, creating two common failure modes. An agent may retrieve the right facts but later ground its decision in stale, missing, or incorrect information; and a syntactically valid tool call may still violate a domain policy that depends on the current task state. We introduce LedgerAgent, an inference-time method for tool-calling agents that maintains observed task states in a separate ledger and renders the states into the prompt. The ledger is also used to check state-dependent policy constraints before environment-changing tool calls are executed, blocking policy violations. Across four customer-service domains and a mixed panel of open- and closed-weight models, LedgerAgent improves average passk over a standard prompt-based tool-calling approach, with the largest gains under stricter multi-trial consistency metrics.