MotionFlux: Efficiënte tekstgestuurde beweginggeneratie via gecorrigeerde Flow Matching en voorkeursuitlijning

Samenvatting

Bewegingsgeneratie is essentieel voor het animeren van virtuele personages en belichaamde agents. Hoewel recente tekstgestuurde methoden aanzienlijke vooruitgang hebben geboekt, hebben ze vaak moeite met het bereiken van precieze afstemming tussen linguïstische beschrijvingen en bewegingssemantiek, evenals met de inefficiënties van trage, meerstaps inferentie. Om deze problemen aan te pakken, introduceren we TMR++ Aligned Preference Optimization (TAPO), een innovatief framework dat subtiele bewegingsvariaties afstemt op tekstuele modifiers en iteratieve aanpassingen incorporeert om semantische verankering te versterken. Om real-time synthese verder mogelijk te maken, stellen we MotionFLUX voor, een hoogwaardig generatieframework gebaseerd op deterministische rectified flow matching. In tegenstelling tot traditionele diffusiemodellen, die honderden denoising-stappen vereisen, construeert MotionFLUX optimale transportpaden tussen ruisverdelingen en bewegingsruimtes, waardoor real-time synthese mogelijk wordt. De gelinieerde waarschijnlijkheidspaden verminderen de noodzaak van meerstaps sampling die typisch is voor sequentiële methoden, wat de inferentietijd aanzienlijk versnelt zonder in te leveren op bewegingskwaliteit. Experimentele resultaten tonen aan dat TAPO en MotionFLUX samen een geïntegreerd systeem vormen dat state-of-the-art benaderingen overtreft in zowel semantische consistentie als bewegingskwaliteit, terwijl het ook de generatiesnelheid versnelt. De code en voorgetrainde modellen zullen worden vrijgegeven.

English

Motion generation is essential for animating virtual characters and embodied agents. While recent text-driven methods have made significant strides, they often struggle with achieving precise alignment between linguistic descriptions and motion semantics, as well as with the inefficiencies of slow, multi-step inference. To address these issues, we introduce TMR++ Aligned Preference Optimization (TAPO), an innovative framework that aligns subtle motion variations with textual modifiers and incorporates iterative adjustments to reinforce semantic grounding. To further enable real-time synthesis, we propose MotionFLUX, a high-speed generation framework based on deterministic rectified flow matching. Unlike traditional diffusion models, which require hundreds of denoising steps, MotionFLUX constructs optimal transport paths between noise distributions and motion spaces, facilitating real-time synthesis. The linearized probability paths reduce the need for multi-step sampling typical of sequential methods, significantly accelerating inference time without sacrificing motion quality. Experimental results demonstrate that, together, TAPO and MotionFLUX form a unified system that outperforms state-of-the-art approaches in both semantic consistency and motion quality, while also accelerating generation speed. The code and pretrained models will be released.

MotionFlux: Efficiënte tekstgestuurde beweginggeneratie via gecorrigeerde Flow Matching en voorkeursuitlijning

MotionFlux: Efficient Text-Guided Motion Generation through Rectified Flow Matching and Preference Alignment

Samenvatting

Support