OmniRetarget: ヒューマノイドの全身移動操作とシーンインタラクションのためのインタラクション保存型データ生成

要旨

ヒューマノイドロボットに複雑なスキルを教えるための主要なパラダイムとして、人間の動作を運動学的参照としてリターゲットし、強化学習（RL）ポリシーを訓練する方法が挙げられる。しかし、既存のリターゲットパイプラインは、人間とロボットの間の大きな身体構造のギャップに苦戦し、足のスケーティングや貫通などの物理的に不自然なアーティファクトを生成することが多い。さらに重要なことに、一般的なリターゲット方法は、表現力豊かな移動や移動操作に不可欠な人間と物体、人間と環境の豊かな相互作用を無視している。この問題に対処するため、我々はOmniRetargetを導入する。これは、エージェント、地形、操作対象物の間の重要な空間的および接触関係を明示的にモデル化し、保存するインタラクションメッシュに基づくインタラクション保存型データ生成エンジンである。人間とロボットのメッシュ間のラプラシアン変形を最小化しつつ、運動学的制約を強制することで、OmniRetargetは運動学的に実現可能な軌道を生成する。さらに、タスクに関連する相互作用を保存することで、単一のデモンストレーションから異なるロボットの身体構造、地形、物体の設定への効率的なデータ拡張が可能となる。我々は、OMOMO、LAFAN1、および社内のMoCapデータセットから動作をリターゲットし、広く使用されているベースラインよりも優れた運動学的制約の満足度と接触保存を達成する8時間以上の軌道を生成することで、OmniRetargetを包括的に評価した。このような高品質なデータにより、プロプリオセプティブRLポリシーは、Unitree G1ヒューマノイド上で、すべてのタスクで共有される5つの報酬項と単純なドメインランダム化のみで訓練され、学習カリキュラムなしに、長期間（最大30秒）のパルクールや移動操作スキルを成功裏に実行することが可能となった。

English

A dominant paradigm for teaching humanoid robots complex skills is to retarget human motions as kinematic references to train reinforcement learning (RL) policies. However, existing retargeting pipelines often struggle with the significant embodiment gap between humans and robots, producing physically implausible artifacts like foot-skating and penetration. More importantly, common retargeting methods neglect the rich human-object and human-environment interactions essential for expressive locomotion and loco-manipulation. To address this, we introduce OmniRetarget, an interaction-preserving data generation engine based on an interaction mesh that explicitly models and preserves the crucial spatial and contact relationships between an agent, the terrain, and manipulated objects. By minimizing the Laplacian deformation between the human and robot meshes while enforcing kinematic constraints, OmniRetarget generates kinematically feasible trajectories. Moreover, preserving task-relevant interactions enables efficient data augmentation, from a single demonstration to different robot embodiments, terrains, and object configurations. We comprehensively evaluate OmniRetarget by retargeting motions from OMOMO, LAFAN1, and our in-house MoCap datasets, generating over 8-hour trajectories that achieve better kinematic constraint satisfaction and contact preservation than widely used baselines. Such high-quality data enables proprioceptive RL policies to successfully execute long-horizon (up to 30 seconds) parkour and loco-manipulation skills on a Unitree G1 humanoid, trained with only 5 reward terms and simple domain randomization shared by all tasks, without any learning curriculum.