SkillMimic-V2:从稀疏且含噪声的演示中学习稳健且可泛化的交互技能
SkillMimic-V2: Learning Robust and Generalizable Interaction Skills from Sparse and Noisy Demonstrations
May 4, 2025
作者: Runyi Yu, Yinhuai Wang, Qihan Zhao, Hok Wai Tsui, Jingbo Wang, Ping Tan, Qifeng Chen
cs.AI
摘要
我们解决了从交互演示中进行强化学习(RLID)的一个根本性挑战:演示噪声与覆盖范围限制。尽管现有的数据收集方法提供了有价值的交互演示,但它们往往产生稀疏、不连贯且含有噪声的轨迹,未能全面捕捉技能变化与过渡的全部可能性。我们的核心洞见在于,即便面对噪声多且稀疏的演示,仍存在无限多条物理上可行的轨迹,这些轨迹自然地桥接了已展示技能之间,或从其邻近状态中涌现,形成了一个连续的可能技能变化与过渡空间。基于这一洞见,我们提出了两种数据增强技术:一是缝合轨迹图(STG),它探索演示技能间潜在的过渡;二是状态转移场(STF),它为演示邻域内的任意状态建立独特连接。为了利用增强数据实现有效的RLID,我们开发了自适应轨迹采样(ATS)策略,用于动态课程生成,以及历史编码机制,以支持依赖记忆的技能学习。我们的方法促进了稳健的技能获取,显著超越了参考演示的泛化能力。在多种交互任务上的广泛实验表明,相较于现有最先进方法,在收敛稳定性、泛化能力和恢复鲁棒性方面均取得了显著提升。
English
We address a fundamental challenge in Reinforcement Learning from Interaction
Demonstration (RLID): demonstration noise and coverage limitations. While
existing data collection approaches provide valuable interaction
demonstrations, they often yield sparse, disconnected, and noisy trajectories
that fail to capture the full spectrum of possible skill variations and
transitions. Our key insight is that despite noisy and sparse demonstrations,
there exist infinite physically feasible trajectories that naturally bridge
between demonstrated skills or emerge from their neighboring states, forming a
continuous space of possible skill variations and transitions. Building upon
this insight, we present two data augmentation techniques: a Stitched
Trajectory Graph (STG) that discovers potential transitions between
demonstration skills, and a State Transition Field (STF) that establishes
unique connections for arbitrary states within the demonstration neighborhood.
To enable effective RLID with augmented data, we develop an Adaptive Trajectory
Sampling (ATS) strategy for dynamic curriculum generation and a historical
encoding mechanism for memory-dependent skill learning. Our approach enables
robust skill acquisition that significantly generalizes beyond the reference
demonstrations. Extensive experiments across diverse interaction tasks
demonstrate substantial improvements over state-of-the-art methods in terms of
convergence stability, generalization capability, and recovery robustness.Summary
AI-Generated Summary