MotionFlux: 整流フローマッチングと選好整合による効率的なテキスト誘導モーション生成

要旨

モーション生成は、仮想キャラクターやエンボディドエージェントのアニメーションにおいて不可欠である。近年のテキスト駆動型手法は大きな進歩を遂げているが、言語記述とモーションの意味論との正確な整合性を達成することや、遅くて多段階の推論に伴う非効率性に課題を抱えている。これらの問題に対処するため、我々はTMR++ Aligned Preference Optimization (TAPO)を提案する。これは、微妙なモーションのバリエーションをテキスト修飾子と整合させ、反復的な調整を組み込むことで意味的基盤を強化する革新的なフレームワークである。さらに、リアルタイム合成を可能にするために、決定論的な整流フローマッチングに基づく高速生成フレームワークであるMotionFLUXを提案する。従来の拡散モデルが数百回のノイズ除去ステップを必要とするのに対し、MotionFLUXはノイズ分布とモーション空間の間の最適輸送経路を構築し、リアルタイム合成を促進する。線形化された確率経路は、逐次的手法に典型的な多段階サンプリングの必要性を低減し、モーション品質を損なうことなく推論時間を大幅に短縮する。実験結果は、TAPOとMotionFLUXが統合されたシステムが、意味的一貫性とモーション品質の両方において最先端の手法を上回り、生成速度も加速することを示している。コードと事前学習済みモデルは公開される予定である。

English

Motion generation is essential for animating virtual characters and embodied agents. While recent text-driven methods have made significant strides, they often struggle with achieving precise alignment between linguistic descriptions and motion semantics, as well as with the inefficiencies of slow, multi-step inference. To address these issues, we introduce TMR++ Aligned Preference Optimization (TAPO), an innovative framework that aligns subtle motion variations with textual modifiers and incorporates iterative adjustments to reinforce semantic grounding. To further enable real-time synthesis, we propose MotionFLUX, a high-speed generation framework based on deterministic rectified flow matching. Unlike traditional diffusion models, which require hundreds of denoising steps, MotionFLUX constructs optimal transport paths between noise distributions and motion spaces, facilitating real-time synthesis. The linearized probability paths reduce the need for multi-step sampling typical of sequential methods, significantly accelerating inference time without sacrificing motion quality. Experimental results demonstrate that, together, TAPO and MotionFLUX form a unified system that outperforms state-of-the-art approaches in both semantic consistency and motion quality, while also accelerating generation speed. The code and pretrained models will be released.

MotionFlux: 整流フローマッチングと選好整合による効率的なテキスト誘導モーション生成

MotionFlux: Efficient Text-Guided Motion Generation through Rectified Flow Matching and Preference Alignment

要旨

Support