TAPE: 언어 모델 에이전트를 위한 도구 주도 적응형 계획 및 제약 실행

초록

언어 모델(LM) 에이전트는 환경과의 다중 상호작용이 필요한 작업 해결에서 뛰어난 능력을 입증해왔습니다. 그러나 단일 오류가 종종 복구 불가능한 실패로 이어지는 환경, 특히 엄격한 실행 가능성 제약 조건 하에서는 취약한 모습을 보입니다. 우리는 기존 에이전트 프레임워크를 체계적으로 분석하여 불완전한 계획 수립과 확률적 실행이 주요 원인임을 규명했습니다. 이러한 문제를 해결하기 위해 우리는 제약 조건 하 실행을 통한 도구 기반 적응형 계획(TAPE)을 제안합니다. TAPE는 다중 계획을 그래프로 집약하고 외부 솔버를 활용하여 실행 가능한 경로를 식별함으로써 계획 수립 능력을 향상시킵니다. 실행 단계에서는 샘플링 노이즈를 줄이기 위해 제약 디코딩을 사용하며, 환경 피드백이 의도한 상태에서 벗어날 때마다 적응적으로 재계획을 수행합니다. 소코반, ALFWorld, MuSiQue, GSM8K-Hard에 대한 실험 결과, TAPE는 특히 어려운 설정에서 기존 프레임워크를 지속적으로 능가하며, 어려운 설정에서 평균 21.0% 포인트, 취약한 기본 모델에 대해 평균 20.0% 포인트의 성공률 향상을 보였습니다. 코드와 데이터는 여기에서 확인할 수 있습니다.

English

Language Model (LM) agents have demonstrated remarkable capabilities in solving tasks that require multiple interactions with the environment. However, they remain vulnerable in environments where a single error often leads to irrecoverable failure, particularly under strict feasibility constraints. We systematically analyze existing agent frameworks, identifying imperfect planning and stochastic execution as the primary causes. To address these challenges, we propose Tool-guided Adaptive Planning with constrained Execution (TAPE). TAPE enhances planning capability by aggregating multiple plans into a graph and employing an external solver to identify a feasible path. During execution, TAPE employs constrained decoding to reduce sampling noise, while adaptively re-planning whenever environmental feedback deviates from the intended state. Experiments across Sokoban, ALFWorld, MuSiQue, and GSM8K-Hard demonstrate that TAPE consistently outperforms existing frameworks, with particularly large gains on hard settings, improving success rates by 21.0 percentage points on hard settings on average, and by 20.0 percentage points for weaker base models on average. Code and data available at here.

TAPE: 언어 모델 에이전트를 위한 도구 주도 적응형 계획 및 제약 실행

TAPE: Tool-Guided Adaptive Planning and Constrained Execution in Language Model Agents

초록

Support