OmniRetarget: 휴머노이드 전신 이동-조작 및 장면 상호작용을 위한 상호작용 보존 데이터 생성

초록

인간형 로봇에게 복잡한 기술을 가르치기 위한 주요 패러다임은 인간의 동작을 운동학적 참조로 재타겟팅하여 강화 학습(RL) 정책을 훈련시키는 것입니다. 그러나 기존의 재타겟팅 파이프라인은 인간과 로봇 간의 상당한 구현 차이로 인해 발 미끄러짐이나 관통과 같은 물리적으로 비현실적인 아티팩트를 생성하는 경우가 많습니다. 더 중요한 것은, 일반적인 재타겟팅 방법은 표현력 있는 이동 및 이동-조작에 필수적인 풍부한 인간-객체 및 인간-환경 상호작용을 간과한다는 점입니다. 이를 해결하기 위해, 우리는 상호작용 메시를 기반으로 한 상호작용 보존 데이터 생성 엔진인 OmniRetarget을 소개합니다. 이 엔진은 에이전트, 지형, 조작된 객체 간의 중요한 공간적 및 접촉 관계를 명시적으로 모델링하고 보존합니다. 인간과 로봇 메시 간의 라플라시안 변형을 최소화하면서 운동학적 제약을 강제함으로써, OmniRetarget은 운동학적으로 실현 가능한 궤적을 생성합니다. 또한, 작업 관련 상호작용을 보존함으로써 단일 데모에서 다양한 로봇 구현, 지형, 객체 구성으로의 효율적인 데이터 증강이 가능합니다. 우리는 OMOMO, LAFAN1, 그리고 자체 제작한 MoCap 데이터셋에서 동작을 재타겟팅하여 8시간 이상의 궤적을 생성하며, 널리 사용되는 베이스라인보다 더 나은 운동학적 제약 충족 및 접촉 보존을 달성함으로써 OmniRetarget을 종합적으로 평가합니다. 이러한 고품질 데이터는 Unitree G1 인간형 로봇에서 장기간(최대 30초)의 파쿠르 및 이동-조작 기술을 성공적으로 실행할 수 있는 고유수용성 RL 정책을 가능하게 합니다. 이 정책은 모든 작업에 공유되는 단순한 도메인 랜덤화와 5개의 보상 항목만으로 훈련되었으며, 어떠한 학습 커리큘럼도 필요로 하지 않습니다.

English

A dominant paradigm for teaching humanoid robots complex skills is to retarget human motions as kinematic references to train reinforcement learning (RL) policies. However, existing retargeting pipelines often struggle with the significant embodiment gap between humans and robots, producing physically implausible artifacts like foot-skating and penetration. More importantly, common retargeting methods neglect the rich human-object and human-environment interactions essential for expressive locomotion and loco-manipulation. To address this, we introduce OmniRetarget, an interaction-preserving data generation engine based on an interaction mesh that explicitly models and preserves the crucial spatial and contact relationships between an agent, the terrain, and manipulated objects. By minimizing the Laplacian deformation between the human and robot meshes while enforcing kinematic constraints, OmniRetarget generates kinematically feasible trajectories. Moreover, preserving task-relevant interactions enables efficient data augmentation, from a single demonstration to different robot embodiments, terrains, and object configurations. We comprehensively evaluate OmniRetarget by retargeting motions from OMOMO, LAFAN1, and our in-house MoCap datasets, generating over 8-hour trajectories that achieve better kinematic constraint satisfaction and contact preservation than widely used baselines. Such high-quality data enables proprioceptive RL policies to successfully execute long-horizon (up to 30 seconds) parkour and loco-manipulation skills on a Unitree G1 humanoid, trained with only 5 reward terms and simple domain randomization shared by all tasks, without any learning curriculum.