MotionFlux：通過校正流匹配與偏好對齊實現高效文本引導運動生成

摘要

動作生成對於虛擬角色和具身代理的動畫製作至關重要。儘管近期的文本驅動方法已取得顯著進展，但它們往往難以實現語言描述與動作語義的精確對齊，並且存在多步推理效率低下的問題。為解決這些問題，我們引入了TMR++對齊偏好優化（TAPO），這是一個創新框架，能夠將細微的動作變化與文本修飾符對齊，並通過迭代調整來強化語義基礎。為了進一步實現實時合成，我們提出了MotionFLUX，這是一個基於確定性校正流匹配的高速生成框架。與傳統的擴散模型需要數百步去噪不同，MotionFLUX構建了噪聲分佈與動作空間之間的最優傳輸路徑，從而實現實時合成。線性化的概率路徑減少了序列方法中典型的多步採樣需求，在不犧牲動作質量的情況下顯著加速了推理時間。實驗結果表明，TAPO和MotionFLUX共同構成了一個統一系統，在語義一致性和動作質量上均優於最先進的方法，同時還加快了生成速度。代碼和預訓練模型將被公開。

English

Motion generation is essential for animating virtual characters and embodied agents. While recent text-driven methods have made significant strides, they often struggle with achieving precise alignment between linguistic descriptions and motion semantics, as well as with the inefficiencies of slow, multi-step inference. To address these issues, we introduce TMR++ Aligned Preference Optimization (TAPO), an innovative framework that aligns subtle motion variations with textual modifiers and incorporates iterative adjustments to reinforce semantic grounding. To further enable real-time synthesis, we propose MotionFLUX, a high-speed generation framework based on deterministic rectified flow matching. Unlike traditional diffusion models, which require hundreds of denoising steps, MotionFLUX constructs optimal transport paths between noise distributions and motion spaces, facilitating real-time synthesis. The linearized probability paths reduce the need for multi-step sampling typical of sequential methods, significantly accelerating inference time without sacrificing motion quality. Experimental results demonstrate that, together, TAPO and MotionFLUX form a unified system that outperforms state-of-the-art approaches in both semantic consistency and motion quality, while also accelerating generation speed. The code and pretrained models will be released.

MotionFlux：通過校正流匹配與偏好對齊實現高效文本引導運動生成

MotionFlux: Efficient Text-Guided Motion Generation through Rectified Flow Matching and Preference Alignment

摘要

Support