ChatPaper.aiChatPaper

EgoHumanoid:通过无机器人本体视角演示解锁野外移动操作能力

EgoHumanoid: Unlocking In-the-Wild Loco-Manipulation with Robot-Free Egocentric Demonstration

February 10, 2026
作者: Modi Shi, Shijia Peng, Jin Chen, Haoran Jiang, Yinghui Li, Di Huang, Ping Luo, Hongyang Li, Li Chen
cs.AI

摘要

人类演示能提供丰富的环境多样性且具备天然的可扩展性,因此成为机器人遥操作的理想替代方案。尽管该范式已推动机械臂操控技术的发展,但其在更具挑战性、数据需求更大的人形机器人移动操控领域的潜力仍待探索。我们提出EgoHumanoid框架,首次通过大量第一视角人类演示数据与有限机器人数据协同训练视觉-语言-动作策略,使人形机器人能在多样化现实环境中执行移动操控任务。为弥合人类与机器人之间的本体差异(包括物理形态和视角差异),我们建立了从硬件设计到数据处理的系统性对齐流程:开发了便携式可扩展人类数据采集系统,并制定实用采集协议以提升可迁移性。该人形机器人对齐流程的核心包含两个关键组件:视角对齐通过降低相机高度与视角差异引起的视觉域差异;动作对齐将人类运动映射至统一且运动学可行的人形机器人控制空间。大量真实环境实验表明,引入无机器人参与的第一视角数据后,系统性能较纯机器人基线提升51%,尤其在未知环境中表现突出。我们的分析进一步揭示了哪些行为能有效迁移,以及人类数据规模化应用的潜力。
English
Human demonstrations offer rich environmental diversity and scale naturally, making them an appealing alternative to robot teleoperation. While this paradigm has advanced robot-arm manipulation, its potential for the more challenging, data-hungry problem of humanoid loco-manipulation remains largely unexplored. We present EgoHumanoid, the first framework to co-train a vision-language-action policy using abundant egocentric human demonstrations together with a limited amount of robot data, enabling humanoids to perform loco-manipulation across diverse real-world environments. To bridge the embodiment gap between humans and robots, including discrepancies in physical morphology and viewpoint, we introduce a systematic alignment pipeline spanning from hardware design to data processing. A portable system for scalable human data collection is developed, and we establish practical collection protocols to improve transferability. At the core of our human-to-humanoid alignment pipeline lies two key components. The view alignment reduces visual domain discrepancies caused by camera height and perspective variation. The action alignment maps human motions into a unified, kinematically feasible action space for humanoid control. Extensive real-world experiments demonstrate that incorporating robot-free egocentric data significantly outperforms robot-only baselines by 51\%, particularly in unseen environments. Our analysis further reveals which behaviors transfer effectively and the potential for scaling human data.
PDF131February 14, 2026