ChatPaper.aiChatPaper

人形GPT:扩展数据与结构实现零样本动作追踪

Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking

June 2, 2026
作者: Zekun Qi, Xuchuan Chen, Dairu Liu, Chenghuai Lin, Yunrui Lian, Sikai Liang, Zhikai Zhang, Yu Guan, Jilong Wang, Wenyao Zhang, Xinqiang Yu, He Wang, Li Yi
cs.AI

摘要

我们介绍Humanoid-GPT,一个采用因果注意力的GPT风格Transformer,它在十亿级运动语料上训练,用于全身控制。与以往受限于数据稀缺和敏捷性-泛化权衡的浅层MLP追踪器不同,Humanoid-GPT在20亿帧重定向语料上进行预训练,该语料统一了所有主要动作捕捉数据集和大型内部录制数据。通过扩展数据和模型容量,我们得到一个单一的生成式Transformer,既能追踪高度动态的行为,又能对未见过的动作和控制任务实现前所未有的零样本泛化。大量实验和扩展性分析表明,我们的模型建立了新的性能基准,在追踪高度动态复杂动作的同时,展现出对未见任务的鲁棒零样本泛化能力。
English
We introduce Humanoid-GPT, a GPT-style Transformer with causal attention trained on a billion-scale motion corpus for whole-body control. Unlike prior shallow MLP trackers constrained by scarce data and an agility-generalization trade-off, Humanoid-GPT is pre-trained on a 2B-frame retargeted corpus that unifies all major mocap datasets with large-scale in-house recordings. Scaling both data and model capacity yields a single generative Transformer that tracks highly dynamic behaviors while achieving unprecedented zero-shot generalization to unseen motions and control tasks. Extensive experiments and scaling analyses show that our model establishes a new performance frontier, demonstrating robust zero-shot generalization to unseen tasks while simultaneously tracking highly dynamic and complex motions.