ChatPaper.aiChatPaper

ProAct:交互式环境中的前瞻性智能体行为

ProAct: Agentic Lookahead in Interactive Environments

February 5, 2026
作者: Yangbin Yu, Mingyu Yang, Junyou Li, Yiming Gao, Feiyu Liu, Yijun Yang, Zichuan Lin, Jiafei Lyu, Yicheng Liu, Zhicong Lu, Deheng Ye, Jie Jiang
cs.AI

摘要

现有的大型语言模型(LLM)智能体在需要长程规划的交互环境中表现不佳,主要源于模拟未来状态时的误差累积问题。为此,我们提出ProAct框架,通过两阶段训练范式使智能体内化精准的前向推理能力。首先,我们引入基于环境搜索的轨迹进行监督微调的接地前瞻蒸馏(GLAD)方法,将复杂搜索树压缩为简洁的因果推理链,使智能体无需推理时的搜索计算开销即可学习前瞻逻辑。其次,为提升决策精度,我们提出蒙特卡洛评价器(MC-Critic)——一种即插即用的辅助价值估计模块,可增强PPO、GRPO等策略梯度算法。通过轻量级环境推演来校准价值估计,MC-Critic提供低方差信号以促进稳定策略优化,且无需依赖高成本的基于模型的价值逼近。在随机环境(如2048)和确定性环境(如推箱子)上的实验表明,ProAct能显著提升规划精度。值得注意的是,采用ProAct训练的40亿参数模型不仅超越所有开源基线,更可与最先进的闭源模型媲美,同时展现出对未见过环境的强泛化能力。代码与模型已开源:https://github.com/GreatX3/ProAct
English
Existing Large Language Model (LLM) agents struggle in interactive environments requiring long-horizon planning, primarily due to compounding errors when simulating future states. To address this, we propose ProAct, a framework that enables agents to internalize accurate lookahead reasoning through a two-stage training paradigm. First, we introduce Grounded LookAhead Distillation (GLAD), where the agent undergoes supervised fine-tuning on trajectories derived from environment-based search. By compressing complex search trees into concise, causal reasoning chains, the agent learns the logic of foresight without the computational overhead of inference-time search. Second, to further refine decision accuracy, we propose the Monte-Carlo Critic (MC-Critic), a plug-and-play auxiliary value estimator designed to enhance policy-gradient algorithms like PPO and GRPO. By leveraging lightweight environment rollouts to calibrate value estimates, MC-Critic provides a low-variance signal that facilitates stable policy optimization without relying on expensive model-based value approximation. Experiments on both stochastic (e.g., 2048) and deterministic (e.g., Sokoban) environments demonstrate that ProAct significantly improves planning accuracy. Notably, a 4B parameter model trained with ProAct outperforms all open-source baselines and rivals state-of-the-art closed-source models, while demonstrating robust generalization to unseen environments. The codes and models are available at https://github.com/GreatX3/ProAct
PDF192February 7, 2026