フローマッチングによる統合型番号なしテキストからモーション生成

要旨

生成モデルは、エージェント数が固定されたモーション合成において優れた性能を発揮するが、エージェント数が変動する場合への一般化には課題がある。限られたドメイン固有のデータに基づき、既存手法は自己回帰モデルを用いて再帰的にモーションを生成するが、非効率性や誤差蓄積の問題を抱えている。本研究では、Pyramid Motion Flow (P-Flow) と Semi-Noise Motion Flow (S-Flow) から構成されるUnified Motion Flow (UMF) を提案する。UMFは、エージェント数に依存しないモーション生成を、単一パスで行うモーション事前分布生成ステージと複数パスで行う反応生成ステージに分解する。具体的には、UMFは統一潜在空間を利用して異種モーションデータセット間の分布ギャップを橋渡しし、効果的な統一学習を可能にする。モーション事前分布生成において、P-Flowは異なるノイズレベルに条件付けられた階層的解像度で動作し、計算オーバーヘッドを軽減する。反応生成において、S-Flowは反応変換と文脈再構成を適応的に行う結合確率経路を学習し、誤差蓄積を緩和する。大規模な実験結果とユーザスタディにより、UMFがテキストからの多人数モーション生成における汎用モデルとして有効であることを実証する。プロジェクトページ: https://githubhgh.github.io/umf/。

English

Generative models excel at motion synthesis for a fixed number of agents but struggle to generalize with variable agents. Based on limited, domain-specific data, existing methods employ autoregressive models to generate motion recursively, which suffer from inefficiency and error accumulation. We propose Unified Motion Flow (UMF), which consists of Pyramid Motion Flow (P-Flow) and Semi-Noise Motion Flow (S-Flow). UMF decomposes the number-free motion generation into a single-pass motion prior generation stage and multi-pass reaction generation stages. Specifically, UMF utilizes a unified latent space to bridge the distribution gap between heterogeneous motion datasets, enabling effective unified training. For motion prior generation, P-Flow operates on hierarchical resolutions conditioned on different noise levels, thereby mitigating computational overheads. For reaction generation, S-Flow learns a joint probabilistic path that adaptively performs reaction transformation and context reconstruction, alleviating error accumulation. Extensive results and user studies demonstrate UMF' s effectiveness as a generalist model for multi-person motion generation from text. Project page: https://githubhgh.github.io/umf/.

フローマッチングによる統合型番号なしテキストからモーション生成

Unified Number-Free Text-to-Motion Generation Via Flow Matching

要旨

Support