SkillMimic-V2:從稀疏且含噪的示範中學習魯棒且可泛化的互動技能
SkillMimic-V2: Learning Robust and Generalizable Interaction Skills from Sparse and Noisy Demonstrations
May 4, 2025
作者: Runyi Yu, Yinhuai Wang, Qihan Zhao, Hok Wai Tsui, Jingbo Wang, Ping Tan, Qifeng Chen
cs.AI
摘要
我们针对交互演示强化学习(RLID)中的一个根本性挑战进行了探讨:演示噪声与覆盖范围限制。尽管现有的数据收集方法提供了宝贵的交互演示,但它们往往产生稀疏、不连贯且含有噪声的轨迹,未能全面捕捉技能变化与过渡的全部可能性。我们的核心洞见在于,即便面对噪声多且稀疏的演示,仍存在无限多条物理上可行的轨迹,这些轨迹自然地连接了已演示技能或从其邻近状态中涌现,构成了技能变化与过渡的连续空间。基于这一洞见,我们提出了两种数据增强技术:一是缝合轨迹图(STG),它发掘了演示技能间潜在的过渡路径;二是状态转移场(STF),它为演示邻域内的任意状态建立了独特的连接。为了利用增强数据实现有效的RLID,我们开发了自适应轨迹采样(ATS)策略,用于动态课程生成,以及历史编码机制,以支持依赖记忆的技能学习。我们的方法实现了稳健的技能获取,显著超越了参考演示的泛化能力。在多种交互任务上的广泛实验表明,相较于现有最先进方法,我们的方法在收敛稳定性、泛化能力和恢复鲁棒性方面均取得了显著提升。
English
We address a fundamental challenge in Reinforcement Learning from Interaction
Demonstration (RLID): demonstration noise and coverage limitations. While
existing data collection approaches provide valuable interaction
demonstrations, they often yield sparse, disconnected, and noisy trajectories
that fail to capture the full spectrum of possible skill variations and
transitions. Our key insight is that despite noisy and sparse demonstrations,
there exist infinite physically feasible trajectories that naturally bridge
between demonstrated skills or emerge from their neighboring states, forming a
continuous space of possible skill variations and transitions. Building upon
this insight, we present two data augmentation techniques: a Stitched
Trajectory Graph (STG) that discovers potential transitions between
demonstration skills, and a State Transition Field (STF) that establishes
unique connections for arbitrary states within the demonstration neighborhood.
To enable effective RLID with augmented data, we develop an Adaptive Trajectory
Sampling (ATS) strategy for dynamic curriculum generation and a historical
encoding mechanism for memory-dependent skill learning. Our approach enables
robust skill acquisition that significantly generalizes beyond the reference
demonstrations. Extensive experiments across diverse interaction tasks
demonstrate substantial improvements over state-of-the-art methods in terms of
convergence stability, generalization capability, and recovery robustness.Summary
AI-Generated Summary