磁带：语言模型代理中的工具引导自适应规划与约束执行

摘要

语言模型（LM）智能体在需要与环境进行多重交互的任务中展现出卓越能力，但在单次错误即导致不可逆失败的场景中仍显脆弱，尤其是在严格可行性约束条件下。我们系统分析了现有智能体框架，将不完善规划与随机执行确定为主要症结。为此，我们提出具有约束执行的工具引导自适应规划框架（TAPE）。该框架通过将多重规划方案聚合为有向图，并调用外部求解器识别可行路径来增强规划能力；在执行阶段采用约束解码降低采样噪声，并在环境反馈偏离预期状态时启动自适应重规划。在Sokoban、ALFWorld、MuSiQue和GSM8K-Hard数据集上的实验表明，TAPE始终优于现有框架，在困难场景中提升尤为显著：困难设置平均成功率提升21.0个百分点，弱基础模型平均提升20.0个百分点。代码与数据详见此处。

English

Language Model (LM) agents have demonstrated remarkable capabilities in solving tasks that require multiple interactions with the environment. However, they remain vulnerable in environments where a single error often leads to irrecoverable failure, particularly under strict feasibility constraints. We systematically analyze existing agent frameworks, identifying imperfect planning and stochastic execution as the primary causes. To address these challenges, we propose Tool-guided Adaptive Planning with constrained Execution (TAPE). TAPE enhances planning capability by aggregating multiple plans into a graph and employing an external solver to identify a feasible path. During execution, TAPE employs constrained decoding to reduce sampling noise, while adaptively re-planning whenever environmental feedback deviates from the intended state. Experiments across Sokoban, ALFWorld, MuSiQue, and GSM8K-Hard demonstrate that TAPE consistently outperforms existing frameworks, with particularly large gains on hard settings, improving success rates by 21.0 percentage points on hard settings on average, and by 20.0 percentage points for weaker base models on average. Code and data available at here.

磁带：语言模型代理中的工具引导自适应规划与约束执行

TAPE: Tool-Guided Adaptive Planning and Constrained Execution in Language Model Agents

摘要

Support