AnchorWorld:具身自我中心世界模擬 —— 基於視角的演化定制
AnchorWorld: Embodied Egocentric World Simulation with View-based Evolution Customization
June 5, 2026
作者: Yu Li, Menghan Xia, Gongye Liu, Xintao Wang, Conglang Zhang, Lei Ke, Yuxuan Lin, Ruihang Chu, Pengfei Wan, Kun Gai, Yujiu Yang
cs.AI
摘要
尽管交互式世界建模作为一个关键的前沿领域,其在实际场景所需的多功能可控性方面仍未得到充分探索。为弥补这一不足,我们提出了AnchorWorld框架,该框架通过增强交互完整性和灵活的世界定制机制,推进了自我中心模拟的发展。首先,我们将3D人体运动作为主要的交互模态。为补充自我中心视角中不可见或被截断的身体部位,我们引入了一种辅助训练监督机制,该机制整合了与智能体第一人称感知系统解耦的外部视角。这使得模型能够观察智能体相对于环境的全身定位,从而促进人-世界交互中更稳健的空间定位。此外,我们提出了一种简单而有效的机制,用于定制自我演化的世界。这是通过在统一的世界坐标系内定义锚定视角,并结合描述局部场景动态演化的文本描述来实现的。实验结果表明,AnchorWorld显著优于最先进的基线方法,而消融研究验证了我们关键设计的有效性。值得注意的是,我们的定制方案展现出令人满意的时空几何一致性,并严格遵循预定的演化动态。
English
Despite being a pivotal frontier, interactive world modeling remains underexplored in terms of the versatile controllability required by practical scenarios. To bridge this gap, we present AnchorWorld, a framework that advances egocentric simulation through enhanced interaction integrity and a flexible mechanism for world customization. First, we utilize 3D human motion as the primary interaction modality. To complement the out-of-view or truncated body parts in egocentric views, we introduce an auxiliary training supervision that incorporates exogenous viewpoints decoupled from the agent's first-person sensorium. It allows the model to observe the agent's full-body positioning relative to the environment, facilitating a more robust spatial grounding of human-world interactions. Furthermore, we propose a simple yet effective mechanism for customizing self-evolving worlds. This is achieved by defining anchor views within a unified world coordinate system, coupled with textual descriptions dictating the dynamic evolution of local scenes. Experimental results show that AnchorWorld significantly outperforms state-of-the-art baselines, while ablation studies validate the effectiveness of our key designs. Notably, our customization scheme exhibits promising spatio-temporal geometric consistency and adheres strictly to the prescribed evolutionary dynamics.