Humanoid-GPT: ゼロショット動作追跡のためのデータと構造のスケーリング

要旨

Humanoid-GPTを紹介します。これは、数十億規模のモーションコーパスで学習された、全身制御のための因果的注意機構を備えたGPTスタイルのTransformerです。データ不足と俊敏性と汎化のトレードオフに制約されていた従来の浅いMLPトラッカーとは異なり、Humanoid-GPTは、主要なモーションキャプチャデータセットを大規模な社内記録と統合した20億フレームのリターゲティングコーパスで事前学習されています。データとモデル容量の両方をスケールすることで、単一の生成型Transformerが極めて動的な動作を追跡するとともに、未知の動作や制御タスクに対して前例のないゼロショット汎化を実現します。広範な実験とスケーリング分析により、本モデルが新たな性能のフロンティアを確立し、未知のタスクへのロバストなゼロショット汎化を示しつつ、高度に動的で複雑な動作を同時に追跡できることを実証しています。

English

We introduce Humanoid-GPT, a GPT-style Transformer with causal attention trained on a billion-scale motion corpus for whole-body control. Unlike prior shallow MLP trackers constrained by scarce data and an agility-generalization trade-off, Humanoid-GPT is pre-trained on a 2B-frame retargeted corpus that unifies all major mocap datasets with large-scale in-house recordings. Scaling both data and model capacity yields a single generative Transformer that tracks highly dynamic behaviors while achieving unprecedented zero-shot generalization to unseen motions and control tasks. Extensive experiments and scaling analyses show that our model establishes a new performance frontier, demonstrating robust zero-shot generalization to unseen tasks while simultaneously tracking highly dynamic and complex motions.