SkillMimic-V2：從稀疏且含噪的示範中學習魯棒且可泛化的互動技能

摘要

我们针对交互演示强化学习（RLID）中的一个根本性挑战进行了探讨：演示噪声与覆盖范围限制。尽管现有的数据收集方法提供了宝贵的交互演示，但它们往往产生稀疏、不连贯且含有噪声的轨迹，未能全面捕捉技能变化与过渡的全部可能性。我们的核心洞见在于，即便面对噪声多且稀疏的演示，仍存在无限多条物理上可行的轨迹，这些轨迹自然地连接了已演示技能或从其邻近状态中涌现，构成了技能变化与过渡的连续空间。基于这一洞见，我们提出了两种数据增强技术：一是缝合轨迹图（STG），它发掘了演示技能间潜在的过渡路径；二是状态转移场（STF），它为演示邻域内的任意状态建立了独特的连接。为了利用增强数据实现有效的RLID，我们开发了自适应轨迹采样（ATS）策略，用于动态课程生成，以及历史编码机制，以支持依赖记忆的技能学习。我们的方法实现了稳健的技能获取，显著超越了参考演示的泛化能力。在多种交互任务上的广泛实验表明，相较于现有最先进方法，我们的方法在收敛稳定性、泛化能力和恢复鲁棒性方面均取得了显著提升。

English

We address a fundamental challenge in Reinforcement Learning from Interaction Demonstration (RLID): demonstration noise and coverage limitations. While existing data collection approaches provide valuable interaction demonstrations, they often yield sparse, disconnected, and noisy trajectories that fail to capture the full spectrum of possible skill variations and transitions. Our key insight is that despite noisy and sparse demonstrations, there exist infinite physically feasible trajectories that naturally bridge between demonstrated skills or emerge from their neighboring states, forming a continuous space of possible skill variations and transitions. Building upon this insight, we present two data augmentation techniques: a Stitched Trajectory Graph (STG) that discovers potential transitions between demonstration skills, and a State Transition Field (STF) that establishes unique connections for arbitrary states within the demonstration neighborhood. To enable effective RLID with augmented data, we develop an Adaptive Trajectory Sampling (ATS) strategy for dynamic curriculum generation and a historical encoding mechanism for memory-dependent skill learning. Our approach enables robust skill acquisition that significantly generalizes beyond the reference demonstrations. Extensive experiments across diverse interaction tasks demonstrate substantial improvements over state-of-the-art methods in terms of convergence stability, generalization capability, and recovery robustness.

SkillMimic-V2：從稀疏且含噪的示範中學習魯棒且可泛化的互動技能

SkillMimic-V2: Learning Robust and Generalizable Interaction Skills from Sparse and Noisy Demonstrations

摘要

Support