ChatPaper.aiChatPaper

EgoHumanoid:通过无机器人本体中心演示解锁野外移动操作能力

EgoHumanoid: Unlocking In-the-Wild Loco-Manipulation with Robot-Free Egocentric Demonstration

February 10, 2026
作者: Modi Shi, Shijia Peng, Jin Chen, Haoran Jiang, Yinghui Li, Di Huang, Ping Luo, Hongyang Li, Li Chen
cs.AI

摘要

人類示範數據具有豐富的環境多樣性且能自然擴展規模,因此成為機器人遙操作的理想替代方案。儘管該範式已推動機械臂操控技術的發展,但其在更具挑戰性、數據需求更大的人形機器人移動操控領域的潛力仍待探索。我們提出EgoHumanoid框架——首個利用海量第一視角人類示範數據與有限機器人數據協同訓練視覺-語言-動作策略的系統,使人形機器人能夠在多樣化真實環境中執行移動操控任務。為彌合人類與機器人之間的具身差異(包括物理形態和視角差異),我們建立了從硬件設計到數據處理的系統化對齊流程:開發便攜式可擴展人類數據採集系統,制定實用採集協議以提升遷移性。該人形對齊流程的核心包含兩個關鍵組件:視角對齊通過消除攝像頭高度與透視差異來降低視覺域差異;動作對齊將人類運動映射至統一且運動學可行的人形機器人控制空間。大量實物實驗表明,引入無機器人參與的第一視角數據可使性能較純機器人基準提升51%,尤其在未見環境中表現突出。我們的分析進一步揭示了哪些行為可有效遷移,以及人類數據規模化應用的潛力。
English
Human demonstrations offer rich environmental diversity and scale naturally, making them an appealing alternative to robot teleoperation. While this paradigm has advanced robot-arm manipulation, its potential for the more challenging, data-hungry problem of humanoid loco-manipulation remains largely unexplored. We present EgoHumanoid, the first framework to co-train a vision-language-action policy using abundant egocentric human demonstrations together with a limited amount of robot data, enabling humanoids to perform loco-manipulation across diverse real-world environments. To bridge the embodiment gap between humans and robots, including discrepancies in physical morphology and viewpoint, we introduce a systematic alignment pipeline spanning from hardware design to data processing. A portable system for scalable human data collection is developed, and we establish practical collection protocols to improve transferability. At the core of our human-to-humanoid alignment pipeline lies two key components. The view alignment reduces visual domain discrepancies caused by camera height and perspective variation. The action alignment maps human motions into a unified, kinematically feasible action space for humanoid control. Extensive real-world experiments demonstrate that incorporating robot-free egocentric data significantly outperforms robot-only baselines by 51\%, particularly in unseen environments. Our analysis further reveals which behaviors transfer effectively and the potential for scaling human data.
PDF131February 14, 2026