SkillMimic-V2：从稀疏且含噪声的演示中学习稳健且可泛化的交互技能

摘要

我们解决了从交互演示中进行强化学习（RLID）的一个根本性挑战：演示噪声与覆盖范围限制。尽管现有的数据收集方法提供了有价值的交互演示，但它们往往产生稀疏、不连贯且含有噪声的轨迹，未能全面捕捉技能变化与过渡的全部可能性。我们的核心洞见在于，即便面对噪声多且稀疏的演示，仍存在无限多条物理上可行的轨迹，这些轨迹自然地桥接了已展示技能之间，或从其邻近状态中涌现，形成了一个连续的可能技能变化与过渡空间。基于这一洞见，我们提出了两种数据增强技术：一是缝合轨迹图（STG），它探索演示技能间潜在的过渡；二是状态转移场（STF），它为演示邻域内的任意状态建立独特连接。为了利用增强数据实现有效的RLID，我们开发了自适应轨迹采样（ATS）策略，用于动态课程生成，以及历史编码机制，以支持依赖记忆的技能学习。我们的方法促进了稳健的技能获取，显著超越了参考演示的泛化能力。在多种交互任务上的广泛实验表明，相较于现有最先进方法，在收敛稳定性、泛化能力和恢复鲁棒性方面均取得了显著提升。

English

We address a fundamental challenge in Reinforcement Learning from Interaction Demonstration (RLID): demonstration noise and coverage limitations. While existing data collection approaches provide valuable interaction demonstrations, they often yield sparse, disconnected, and noisy trajectories that fail to capture the full spectrum of possible skill variations and transitions. Our key insight is that despite noisy and sparse demonstrations, there exist infinite physically feasible trajectories that naturally bridge between demonstrated skills or emerge from their neighboring states, forming a continuous space of possible skill variations and transitions. Building upon this insight, we present two data augmentation techniques: a Stitched Trajectory Graph (STG) that discovers potential transitions between demonstration skills, and a State Transition Field (STF) that establishes unique connections for arbitrary states within the demonstration neighborhood. To enable effective RLID with augmented data, we develop an Adaptive Trajectory Sampling (ATS) strategy for dynamic curriculum generation and a historical encoding mechanism for memory-dependent skill learning. Our approach enables robust skill acquisition that significantly generalizes beyond the reference demonstrations. Extensive experiments across diverse interaction tasks demonstrate substantial improvements over state-of-the-art methods in terms of convergence stability, generalization capability, and recovery robustness.

SkillMimic-V2：从稀疏且含噪声的演示中学习稳健且可泛化的交互技能

SkillMimic-V2: Learning Robust and Generalizable Interaction Skills from Sparse and Noisy Demonstrations

摘要

Support