UniSkill：通过跨实体技能表征模仿人类视频

摘要

模仿是人类的一项基本学习机制，使个体能够通过观察和效仿专家来掌握新任务。然而，将这种能力应用于机器人却面临重大挑战，这源于人类与机器人在视觉外观和物理能力上的本质差异。尽管先前的方法通过使用共享场景和任务的跨实体数据集来弥合这一差距，但大规模收集人类与机器人之间对齐的数据并非易事。本文提出UniSkill，一种新颖的框架，它能够从无标签的大规模跨实体视频数据中学习到与实体无关的技能表示，使得从人类视频提示中提取的技能能够有效迁移至仅基于机器人数据训练的策略中。我们在仿真和真实环境中的实验表明，我们的跨实体技能成功指导机器人选择恰当动作，即使面对未见过的视频提示。项目网站可访问：https://kimhanjung.github.io/UniSkill。

English

Mimicry is a fundamental learning mechanism in humans, enabling individuals to learn new tasks by observing and imitating experts. However, applying this ability to robots presents significant challenges due to the inherent differences between human and robot embodiments in both their visual appearance and physical capabilities. While previous methods bridge this gap using cross-embodiment datasets with shared scenes and tasks, collecting such aligned data between humans and robots at scale is not trivial. In this paper, we propose UniSkill, a novel framework that learns embodiment-agnostic skill representations from large-scale cross-embodiment video data without any labels, enabling skills extracted from human video prompts to effectively transfer to robot policies trained only on robot data. Our experiments in both simulation and real-world environments show that our cross-embodiment skills successfully guide robots in selecting appropriate actions, even with unseen video prompts. The project website can be found at: https://kimhanjung.github.io/UniSkill.

UniSkill：通过跨实体技能表征模仿人类视频

UniSkill: Imitating Human Videos via Cross-Embodiment Skill Representations

摘要

Support