想象而后规划:智能体通过自适应前瞻与世界模型进行学习
Imagine-then-Plan: Agent Learning from Adaptive Lookahead with World Models
January 13, 2026
作者: Youwei Liu, Jian Wang, Hanlin Wang, Beichen Guo, Wenjie Li
cs.AI
摘要
世界模型的最新进展为环境状态未来动态建模展现出潜力,使得智能体无需访问真实环境即可进行推理与决策。现有方法主要执行单步或固定步长的轨迹推演,尚未充分挖掘其在复杂任务规划中的潜力。我们提出"先想象后规划"的统一框架,通过前瞻式想象实现智能体学习:策略模型与习得的世界模型交互,生成多步"想象"轨迹。鉴于想象视界可能随任务和阶段动态变化,我们引入一种新颖的自适应前瞻机制,通过权衡终极目标与任务进度来调整步长。由此产生的想象轨迹可提供丰富的未来状态信号(如达成进度与潜在冲突),这些信号与当前观测相融合,构建出部分可观测可想象的马尔可夫决策过程以指导策略学习。我们实现了无需训练和强化训练两种变体,在代表性智能体基准测试中的广泛实验表明,该方法显著优于现有基线。进一步分析验证了自适应前瞻机制能有效增强智能体推理能力,为应对更广泛复杂任务提供了重要启示。
English
Recent advances in world models have shown promise for modeling future dynamics of environmental states, enabling agents to reason and act without accessing real environments. Current methods mainly perform single-step or fixed-horizon rollouts, leaving their potential for complex task planning under-exploited. We propose Imagine-then-Plan (ITP), a unified framework for agent learning via lookahead imagination, where an agent's policy model interacts with the learned world model, yielding multi-step ``imagined'' trajectories. Since the imagination horizon may vary by tasks and stages, we introduce a novel adaptive lookahead mechanism by trading off the ultimate goal and task progress. The resulting imagined trajectories provide rich signals about future consequences, such as achieved progress and potential conflicts, which are fused with current observations, formulating a partially observable and imaginable Markov decision process to guide policy learning. We instantiate ITP with both training-free and reinforcement-trained variants. Extensive experiments across representative agent benchmarks demonstrate that ITP significantly outperforms competitive baselines. Further analyses validate that our adaptive lookahead largely enhances agents' reasoning capability, providing valuable insights into addressing broader, complex tasks.