IAM: アイデンティティ認識に基づく人間の動作と形状の統合的生成

要旨

近年のテキスト駆動人間動作生成の進歩により、自然言語記述から現実的な動作シーケンスを合成するモデルが可能となった。しかし、既存手法の多くはアイデンティティに中立な動作を想定し、標準的な身体表現を用いて動作を生成するため、身体形態が動作力学に与える強い影響を無視している。実際には、身体比率、質量分布、年齢などの属性は動作の実行方法に大きく影響し、この連成効果を無視すると物理的に不整合な動作が生じがちである。本研究では、身体形態と動作力学の関係を明示的にモデル化するアイデンティティ認識型動作生成フレームワークを提案する。明示的な幾何学的計測に依存する代わりに、自然言語記述や視覚的手がかりを含むマルチモーダル信号を用いてアイデンティティを表現する。さらに、動作シーケンスと身体形状パラメータを同時合成する共同動作-形状生成パラダイムを導入し、アイデンティティ手がかりが直接動作力学を調整できるようにする。モーションキャプチャデータセットと大規模実世界ビデオを用いた広範な実験により、高い動作品質を維持しつつ、動作の現実性と動作-アイデンティティ一貫性の改善を実証する。プロジェクトページ: https://vjwq.github.io/IAM

English

Recent advances in text-driven human motion generation enable models to synthesize realistic motion sequences from natural language descriptions. However, most existing approaches assume identity-neutral motion and generate movements using a canonical body representation, ignoring the strong influence of body morphology on motion dynamics. In practice, attributes such as body proportions, mass distribution, and age significantly affect how actions are performed, and neglecting this coupling often leads to physically inconsistent motions. We propose an identity-aware motion generation framework that explicitly models the relationship between body morphology and motion dynamics. Instead of relying on explicit geometric measurements, identity is represented using multimodal signals, including natural language descriptions and visual cues. We further introduce a joint motion-shape generation paradigm that simultaneously synthesizes motion sequences and body shape parameters, allowing identity cues to directly modulate motion dynamics. Extensive experiments on motion capture datasets and large-scale in-the-wild videos demonstrate improved motion realism and motion-identity consistency while maintaining high motion quality. Project page: https://vjwq.github.io/IAM

IAM: アイデンティティ認識に基づく人間の動作と形状の統合的生成

IAM: Identity-Aware Human Motion and Shape Joint Generation

要旨

Support