SkillMimic-V2: 疎でノイズの多いデモンストレーションから堅牢かつ汎用的なインタラクションスキルを学習する

要旨

インタラクション実演からの強化学習（RLID）における根本的な課題、すなわち実演ノイズとカバレッジの限界に取り組みます。既存のデータ収集手法は貴重なインタラクション実演を提供しますが、しばしば疎で断片的かつノイズの多い軌跡を生成し、可能なスキルのバリエーションや遷移の全範囲を捉えることができません。我々の重要な洞察は、ノイズや疎な実演にもかかわらず、実演されたスキル間を自然に橋渡しする、またはそれらの近傍状態から生じる無限の物理的に実現可能な軌跡が存在し、可能なスキルのバリエーションと遷移の連続的な空間を形成するという点です。この洞察に基づき、我々は2つのデータ拡張技術を提案します。一つは、実演スキル間の潜在的な遷移を発見するStitched Trajectory Graph（STG）であり、もう一つは、実演近傍内の任意の状態に対して一意の接続を確立するState Transition Field（STF）です。拡張データを用いた効果的なRLIDを実現するため、動的なカリキュラム生成のためのAdaptive Trajectory Sampling（ATS）戦略と、メモリ依存型スキル学習のための履歴エンコーディングメカニズムを開発しました。我々のアプローチは、参照実演を大幅に超える汎化能力を持つ堅牢なスキル獲得を可能にします。多様なインタラクションタスクにわたる広範な実験により、収束安定性、汎化能力、および回復ロバスト性の点で、最先端の手法を大幅に上回る改善が実証されました。

English

We address a fundamental challenge in Reinforcement Learning from Interaction Demonstration (RLID): demonstration noise and coverage limitations. While existing data collection approaches provide valuable interaction demonstrations, they often yield sparse, disconnected, and noisy trajectories that fail to capture the full spectrum of possible skill variations and transitions. Our key insight is that despite noisy and sparse demonstrations, there exist infinite physically feasible trajectories that naturally bridge between demonstrated skills or emerge from their neighboring states, forming a continuous space of possible skill variations and transitions. Building upon this insight, we present two data augmentation techniques: a Stitched Trajectory Graph (STG) that discovers potential transitions between demonstration skills, and a State Transition Field (STF) that establishes unique connections for arbitrary states within the demonstration neighborhood. To enable effective RLID with augmented data, we develop an Adaptive Trajectory Sampling (ATS) strategy for dynamic curriculum generation and a historical encoding mechanism for memory-dependent skill learning. Our approach enables robust skill acquisition that significantly generalizes beyond the reference demonstrations. Extensive experiments across diverse interaction tasks demonstrate substantial improvements over state-of-the-art methods in terms of convergence stability, generalization capability, and recovery robustness.

SkillMimic-V2: 疎でノイズの多いデモンストレーションから堅牢かつ汎用的なインタラクションスキルを学習する

SkillMimic-V2: Learning Robust and Generalizable Interaction Skills from Sparse and Noisy Demonstrations

要旨

Support