OmniRetarget:面向人形机器人全身运动操控与场景交互的交互保持型数据生成
OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction
September 30, 2025
作者: Lujie Yang, Xiaoyu Huang, Zhen Wu, Angjoo Kanazawa, Pieter Abbeel, Carmelo Sferrazza, C. Karen Liu, Rocky Duan, Guanya Shi
cs.AI
摘要
当前,教导人形机器人掌握复杂技能的主流方法是将人类动作重定向为运动学参考,以训练强化学习(RL)策略。然而,现有的重定向流程常因人类与机器人之间显著的形态差异而难以应对,导致诸如脚部滑动和穿透等物理上不合理的现象。更重要的是,常见的重定向方法忽视了丰富的人-物及人-环境交互,这些交互对于表现力丰富的移动和移动操作至关重要。为此,我们提出了OmniRetarget,这是一个基于交互网格的数据生成引擎,它明确建模并保留了代理、地形及操作对象之间关键的空间与接触关系。通过最小化人类与机器人网格间的拉普拉斯变形,同时施加运动学约束,OmniRetarget生成了运动学上可行的轨迹。此外,保留任务相关的交互使得从单一演示到不同机器人形态、地形及物体配置的高效数据增强成为可能。我们通过重定向来自OMOMO、LAFAN1及我们内部动作捕捉数据集的动作,全面评估了OmniRetarget,生成了超过8小时的轨迹,这些轨迹在运动学约束满足度和接触保持方面均优于广泛使用的基线方法。如此高质量的数据使得本体感知RL策略能够在Unitree G1人形机器人上成功执行长达30秒的跑酷和移动操作技能,仅需5个奖励项和所有任务共享的简单领域随机化,无需任何学习课程设计。
English
A dominant paradigm for teaching humanoid robots complex skills is to
retarget human motions as kinematic references to train reinforcement learning
(RL) policies. However, existing retargeting pipelines often struggle with the
significant embodiment gap between humans and robots, producing physically
implausible artifacts like foot-skating and penetration. More importantly,
common retargeting methods neglect the rich human-object and human-environment
interactions essential for expressive locomotion and loco-manipulation. To
address this, we introduce OmniRetarget, an interaction-preserving data
generation engine based on an interaction mesh that explicitly models and
preserves the crucial spatial and contact relationships between an agent, the
terrain, and manipulated objects. By minimizing the Laplacian deformation
between the human and robot meshes while enforcing kinematic constraints,
OmniRetarget generates kinematically feasible trajectories. Moreover,
preserving task-relevant interactions enables efficient data augmentation, from
a single demonstration to different robot embodiments, terrains, and object
configurations. We comprehensively evaluate OmniRetarget by retargeting motions
from OMOMO, LAFAN1, and our in-house MoCap datasets, generating over 8-hour
trajectories that achieve better kinematic constraint satisfaction and contact
preservation than widely used baselines. Such high-quality data enables
proprioceptive RL policies to successfully execute long-horizon (up to 30
seconds) parkour and loco-manipulation skills on a Unitree G1 humanoid, trained
with only 5 reward terms and simple domain randomization shared by all tasks,
without any learning curriculum.