ChatPaper.aiChatPaper

Being-H0.5:面向跨具身泛化的人类中心机器人学习规模化研究

Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization

January 19, 2026
作者: Hao Luo, Ye Wang, Wanpeng Zhang, Sipeng Zheng, Ziheng Xi, Chaoyi Xu, Haiweng Xu, Haoqi Yuan, Chi Zhang, Yiqing Wang, Yicheng Feng, Zongqing Lu
cs.AI

摘要

我们推出Being-H0.5——一个面向多样化机器人平台、具备强大跨具身泛化能力的基础视觉-语言-动作模型。针对现有VLA模型常受限于形态异构性与数据稀缺的挑战,我们提出以人类为中心的学习范式,将人类交互轨迹视为物理交互的通用"母语"。为此,我们发布迄今最大规模的具身预训练方案UniHand-2.0,整合超过3.5万小时跨30种异构机器人平台的多模态数据。该方案创新性地构建了统一动作空间,将异构机器人控制映射至语义对齐的槽位,使低资源机器人能够从人类数据与高资源平台中快速习得技能。基于此人类中心框架,我们设计了统一序列建模与多任务预训练范式,有效桥接人类示范与机器人执行。在架构层面,Being-H0.5采用混合Transformer设计,其新颖的流混合框架可将通用运动基元与特定具身专家解耦。最后,为保障跨具身策略在现实世界的稳定性,我们提出流形保持门控机制以增强感知偏移下的鲁棒性,并采用通用异步分块技术实现不同延迟与控制特性平台的标准化分块控制。实验表明,Being-H0.5在LIBERO(98.9%)和RoboCasa(53.9%)等仿真基准测试中达到最先进水平,同时在五种机器人平台上展现出强大的跨具身泛化能力。
English
We introduce Being-H0.5, a foundational Vision-Language-Action (VLA) model designed for robust cross-embodiment generalization across diverse robotic platforms. While existing VLAs often struggle with morphological heterogeneity and data scarcity, we propose a human-centric learning paradigm that treats human interaction traces as a universal "mother tongue" for physical interaction. To support this, we present UniHand-2.0, the largest embodied pre-training recipe to date, comprising over 35,000 hours of multimodal data across 30 distinct robotic embodiments. Our approach introduces a Unified Action Space that maps heterogeneous robot controls into semantically aligned slots, enabling low-resource robots to bootstrap skills from human data and high-resource platforms. Built upon this human-centric foundation, we design a unified sequential modeling and multi-task pre-training paradigm to bridge human demonstrations and robotic execution. Architecturally, Being-H0.5 utilizes a Mixture-of-Transformers design featuring a novel Mixture-of-Flow (MoF) framework to decouple shared motor primitives from specialized embodiment-specific experts. Finally, to make cross-embodiment policies stable in the real world, we introduce Manifold-Preserving Gating for robustness under sensory shift and Universal Async Chunking to universalize chunked control across embodiments with different latency and control profiles. We empirically demonstrate that Being-H0.5 achieves state-of-the-art results on simulated benchmarks, such as LIBERO (98.9%) and RoboCasa (53.9%), while also exhibiting strong cross-embodiment capabilities on five robotic platforms.
PDF591January 22, 2026