X-Sim:基於實體至虛擬再至實體的跨體現學習
X-Sim: Cross-Embodiment Learning via Real-to-Sim-to-Real
May 11, 2025
作者: Prithwish Dan, Kushal Kedia, Angela Chao, Edward Weiyi Duan, Maximus Adrian Pace, Wei-Chiu Ma, Sanjiban Choudhury
cs.AI
摘要
人类视频为训练机器人操作策略提供了一种可扩展的方法,但缺乏标准模仿学习算法所需的动作标签。现有的跨实体映射方法试图将人类动作转化为机器人动作,但在实体差异显著时往往失效。我们提出了X-Sim,一个从真实到模拟再到真实的框架,利用物体运动作为密集且可转移的信号来学习机器人策略。X-Sim首先从RGBD人类视频中重建出逼真的模拟环境,并追踪物体轨迹以定义以物体为中心的奖励。这些奖励用于在模拟中训练强化学习(RL)策略。随后,通过使用不同视角和光照渲染的合成轨迹,将学习到的策略提炼为基于图像的扩散策略。为了迁移到现实世界,X-Sim引入了一种在线领域适应技术,在部署过程中对齐真实与模拟的观察。重要的是,X-Sim不需要任何机器人遥操作数据。我们在两个环境中的5个操作任务上对其进行了评估,结果表明:(1)与手动追踪和模拟到真实的基线相比,平均提高了30%的任务进度;(2)在数据收集时间减少10倍的情况下,与行为克隆相匹配;(3)能够泛化到新的相机视角和测试时的变化。代码和视频可在https://portal-cornell.github.io/X-Sim/获取。
English
Human videos offer a scalable way to train robot manipulation policies, but
lack the action labels needed by standard imitation learning algorithms.
Existing cross-embodiment approaches try to map human motion to robot actions,
but often fail when the embodiments differ significantly. We propose X-Sim, a
real-to-sim-to-real framework that uses object motion as a dense and
transferable signal for learning robot policies. X-Sim starts by reconstructing
a photorealistic simulation from an RGBD human video and tracking object
trajectories to define object-centric rewards. These rewards are used to train
a reinforcement learning (RL) policy in simulation. The learned policy is then
distilled into an image-conditioned diffusion policy using synthetic rollouts
rendered with varied viewpoints and lighting. To transfer to the real world,
X-Sim introduces an online domain adaptation technique that aligns real and
simulated observations during deployment. Importantly, X-Sim does not require
any robot teleoperation data. We evaluate it across 5 manipulation tasks in 2
environments and show that it: (1) improves task progress by 30% on average
over hand-tracking and sim-to-real baselines, (2) matches behavior cloning with
10x less data collection time, and (3) generalizes to new camera viewpoints and
test-time changes. Code and videos are available at
https://portal-cornell.github.io/X-Sim/.Summary
AI-Generated Summary