ChatPaper.aiChatPaper

想象而后规划:智能体通过自适应前瞻与世界模型学习

Imagine-then-Plan: Agent Learning from Adaptive Lookahead with World Models

January 13, 2026
作者: Youwei Liu, Jian Wang, Hanlin Wang, Beichen Guo, Wenjie Li
cs.AI

摘要

世界模型的最新进展为环境状态未来动态的建模带来了希望,使智能体能够在无需访问真实环境的情况下进行推理与行动。现有方法主要执行单步或固定步长的推演,其在复杂任务规划方面的潜力尚未得到充分挖掘。我们提出"想象后规划"(ITP)这一通过前瞻想象实现智能体学习的统一框架,该框架使策略模型与习得的世界模型交互,生成多步"想象"轨迹。鉴于不同任务和阶段的想象跨度可能存在差异,我们通过权衡终极目标与任务进度,引入了一种新型自适应前瞻机制。由此产生的想象轨迹可提供关于未来结果的丰富信号(如已达成进度和潜在冲突),这些信号与当前观测相融合,构建出部分可观测且可想象的马尔可夫决策过程以指导策略学习。我们通过免训练和强化训练两种变体实现了ITP框架。在代表性智能体基准测试上的大量实验表明,ITP显著优于现有竞争基线。进一步分析验证了我们的自适应前瞻机制能大幅增强智能体的推理能力,为应对更广泛复杂任务提供了重要启示。
English
Recent advances in world models have shown promise for modeling future dynamics of environmental states, enabling agents to reason and act without accessing real environments. Current methods mainly perform single-step or fixed-horizon rollouts, leaving their potential for complex task planning under-exploited. We propose Imagine-then-Plan (ITP), a unified framework for agent learning via lookahead imagination, where an agent's policy model interacts with the learned world model, yielding multi-step ``imagined'' trajectories. Since the imagination horizon may vary by tasks and stages, we introduce a novel adaptive lookahead mechanism by trading off the ultimate goal and task progress. The resulting imagined trajectories provide rich signals about future consequences, such as achieved progress and potential conflicts, which are fused with current observations, formulating a partially observable and imaginable Markov decision process to guide policy learning. We instantiate ITP with both training-free and reinforcement-trained variants. Extensive experiments across representative agent benchmarks demonstrate that ITP significantly outperforms competitive baselines. Further analyses validate that our adaptive lookahead largely enhances agents' reasoning capability, providing valuable insights into addressing broader, complex tasks.
PDF91January 16, 2026