基于知识经验学习的能动世界模型对齐
Aligning Agentic World Models via Knowledgeable Experience Learning
January 19, 2026
作者: Baochang Ren, Yunzhi Yao, Rui Sun, Shuofei Qiao, Ningyu Zhang, Huajun Chen
cs.AI
摘要
当前大型语言模型存在关键的模式脱节:它们拥有海量语义知识,却缺乏对物理世界恒定法则的程序性认知。因此,尽管这些智能体隐式地充当着世界模型,其模拟过程常出现物理幻觉——生成逻辑合理但物理上不可执行的计划。现有对齐策略主要依赖资源密集型的训练或微调,试图将动态环境规则压缩为静态模型参数。然而这种参数化封装具有固有刚性,难以适应物理动态的开放可变性而无需持续昂贵的再训练。为弥补这一鸿沟,我们提出WorldMind框架,通过综合环境反馈自主构建符号化世界知识库。具体而言,它统一了通过预测误差强化物理可行性的过程经验,以及借助成功轨迹引导任务最优化的目标经验。在EB-ALFRED和EB-Habitat上的实验表明,WorldMind在跨模型、跨环境可迁移性方面显著优于基线方法。
English
Current Large Language Models (LLMs) exhibit a critical modal disconnect: they possess vast semantic knowledge but lack the procedural grounding to respect the immutable laws of the physical world. Consequently, while these agents implicitly function as world models, their simulations often suffer from physical hallucinations-generating plans that are logically sound but physically unexecutable. Existing alignment strategies predominantly rely on resource-intensive training or fine-tuning, which attempt to compress dynamic environmental rules into static model parameters. However, such parametric encapsulation is inherently rigid, struggling to adapt to the open-ended variability of physical dynamics without continuous, costly retraining. To bridge this gap, we introduce WorldMind, a framework that autonomously constructs a symbolic World Knowledge Repository by synthesizing environmental feedback. Specifically, it unifies Process Experience to enforce physical feasibility via prediction errors and Goal Experience to guide task optimality through successful trajectories. Experiments on EB-ALFRED and EB-Habitat demonstrate that WorldMind achieves superior performance compared to baselines with remarkable cross-model and cross-environment transferability.