UniSkill:通过跨实体技能表征模仿人类视频
UniSkill: Imitating Human Videos via Cross-Embodiment Skill Representations
May 13, 2025
作者: Hanjung Kim, Jaehyun Kang, Hyolim Kang, Meedeum Cho, Seon Joo Kim, Youngwoon Lee
cs.AI
摘要
模仿是人类的一项基本学习机制,使个体能够通过观察和效仿专家来掌握新任务。然而,将这种能力应用于机器人却面临重大挑战,这源于人类与机器人在视觉外观和物理能力上的本质差异。尽管先前的方法通过使用共享场景和任务的跨实体数据集来弥合这一差距,但大规模收集人类与机器人之间对齐的数据并非易事。本文提出UniSkill,一种新颖的框架,它能够从无标签的大规模跨实体视频数据中学习到与实体无关的技能表示,使得从人类视频提示中提取的技能能够有效迁移至仅基于机器人数据训练的策略中。我们在仿真和真实环境中的实验表明,我们的跨实体技能成功指导机器人选择恰当动作,即使面对未见过的视频提示。项目网站可访问:https://kimhanjung.github.io/UniSkill。
English
Mimicry is a fundamental learning mechanism in humans, enabling individuals
to learn new tasks by observing and imitating experts. However, applying this
ability to robots presents significant challenges due to the inherent
differences between human and robot embodiments in both their visual appearance
and physical capabilities. While previous methods bridge this gap using
cross-embodiment datasets with shared scenes and tasks, collecting such aligned
data between humans and robots at scale is not trivial. In this paper, we
propose UniSkill, a novel framework that learns embodiment-agnostic skill
representations from large-scale cross-embodiment video data without any
labels, enabling skills extracted from human video prompts to effectively
transfer to robot policies trained only on robot data. Our experiments in both
simulation and real-world environments show that our cross-embodiment skills
successfully guide robots in selecting appropriate actions, even with unseen
video prompts. The project website can be found at:
https://kimhanjung.github.io/UniSkill.