EgoSim:面向具身交互生成的第一人称世界模拟器
EgoSim: Egocentric World Simulator for Embodied Interaction Generation
April 1, 2026
作者: Jinkun Hao, Mingda Jia, Ruiyan Wang, Xihui Liu, Ran Yi, Lizhuang Ma, Jiangmiao Pang, Xudong Xu
cs.AI
摘要
我们推出EgoSim——一种闭环第一人称世界模拟器,能生成空间一致的交互视频并持续更新底层3D场景状态以实现连续仿真。现有第一人称模拟器或缺乏显式3D基础导致视角变化下的结构漂移,或将场景视为静态而无法更新多阶段交互中的世界状态。EgoSim通过将3D场景建模为可更新的世界状态,同时解决了这两大局限。我们通过几何动作感知的观测模拟模型生成具身交互,并借助交互感知状态更新模块确保空间一致性。为克服密集对齐的场景-交互训练数据难以获取造成的数据瓶颈,我们设计了可扩展流程,从野外大规模单目第一人称视频中提取静态点云、相机轨迹和具身动作。我们还推出EgoCap采集系统,支持使用未校准智能手机进行低成本现实世界数据采集。大量实验表明,EgoSim在视觉质量、空间一致性以及对复杂场景和野外灵巧交互的泛化能力上显著优于现有方法,同时支持跨具身迁移至机器人操作。代码与数据集即将开源,项目页面详见egosimulator.github.io。
English
We introduce EgoSim, a closed-loop egocentric world simulator that generates spatially consistent interaction videos and persistently updates the underlying 3D scene state for continuous simulation. Existing egocentric simulators either lack explicit 3D grounding, causing structural drift under viewpoint changes, or treat the scene as static, failing to update world states across multi-stage interactions. EgoSim addresses both limitations by modeling 3D scenes as updatable world states. We generate embodiment interactions via a Geometry-action-aware Observation Simulation model, with spatial consistency from an Interaction-aware State Updating module. To overcome the critical data bottleneck posed by the difficulty in acquiring densely aligned scene-interaction training pairs, we design a scalable pipeline that extracts static point clouds, camera trajectories, and embodiment actions from in-the-wild large-scale monocular egocentric videos. We further introduce EgoCap, a capture system that enables low-cost real-world data collection with uncalibrated smartphones. Extensive experiments demonstrate that EgoSim significantly outperforms existing methods in terms of visual quality, spatial consistency, and generalization to complex scenes and in-the-wild dexterous interactions, while supporting cross-embodiment transfer to robotic manipulation. Codes and datasets will be open soon. The project page is at egosimulator.github.io.