ChatPaper.aiChatPaper

Being-H0.5:以人為本的機器人學習跨實體泛化擴展研究 (注:標題採用學術論文常見的"主標題:副標題"結構,其中"Scaling"譯為"擴展研究"更符合中文論文標題習慣,"Human-Centric"意譯為"以人為本"突出研究理念,"Cross-Embodiment Generalization"準確譯為"跨實體泛化"保持機器人學術語規範)

Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization

January 19, 2026
作者: Hao Luo, Ye Wang, Wanpeng Zhang, Sipeng Zheng, Ziheng Xi, Chaoyi Xu, Haiweng Xu, Haoqi Yuan, Chi Zhang, Yiqing Wang, Yicheng Feng, Zongqing Lu
cs.AI

摘要

我們推出Being-H0.5——一個專為跨機器人平台實現強健跨具身泛化而設計的基礎視覺-語言-動作模型。針對現有VLA模型常面臨形態異構性與數據稀缺的挑戰,我們提出以人為本的學習範式,將人類互動軌跡視為物理交互的通用「母語」。為支持此範式,我們發布迄今最大的具身預訓練方案UniHand-2.0,整合30種不同機器人具身形態的逾3.5萬小時多模態數據。該方案首創統一動作空間,將異構機器人控制映射至語義對齊的槽位,使低資源機器人能從人類數據與高資源平台引導技能。基於此以人為本框架,我們設計統一的序列建模與多任務預訓練範式,橋接人類示範與機器人執行。架構上,Being-H0.5採用混合專家變換器設計,創新性地通過混合流框架解耦共享運動基元與專用具身專家。最後,為確保跨具身策略在現實環境中的穩定性,我們引入流形保持門控機制以應對感知偏移,並提出通用異步分塊技術,使分塊控制能適應不同延遲與控制特性的具身平台。實證研究表明,Being-H0.5在LIBERO(98.9%)和RoboCasa(53.9%)等仿真基準測試中達到最先進水平,同時在五類機器人平台上展現出強大的跨具身泛化能力。
English
We introduce Being-H0.5, a foundational Vision-Language-Action (VLA) model designed for robust cross-embodiment generalization across diverse robotic platforms. While existing VLAs often struggle with morphological heterogeneity and data scarcity, we propose a human-centric learning paradigm that treats human interaction traces as a universal "mother tongue" for physical interaction. To support this, we present UniHand-2.0, the largest embodied pre-training recipe to date, comprising over 35,000 hours of multimodal data across 30 distinct robotic embodiments. Our approach introduces a Unified Action Space that maps heterogeneous robot controls into semantically aligned slots, enabling low-resource robots to bootstrap skills from human data and high-resource platforms. Built upon this human-centric foundation, we design a unified sequential modeling and multi-task pre-training paradigm to bridge human demonstrations and robotic execution. Architecturally, Being-H0.5 utilizes a Mixture-of-Transformers design featuring a novel Mixture-of-Flow (MoF) framework to decouple shared motor primitives from specialized embodiment-specific experts. Finally, to make cross-embodiment policies stable in the real world, we introduce Manifold-Preserving Gating for robustness under sensory shift and Universal Async Chunking to universalize chunked control across embodiments with different latency and control profiles. We empirically demonstrate that Being-H0.5 achieves state-of-the-art results on simulated benchmarks, such as LIBERO (98.9%) and RoboCasa (53.9%), while also exhibiting strong cross-embodiment capabilities on five robotic platforms.
PDF591January 22, 2026