基于流匹配的统一无编号文本到动作生成

摘要

生成模型在固定智能体数量下的运动合成方面表现出色，但在可变智能体场景中的泛化能力不足。基于有限的领域特定数据，现有方法采用自回归模型递归生成运动，存在效率低下和误差累积的问题。我们提出统一运动流（UMF）框架，包含金字塔运动流（P-Flow）和半噪声运动流（S- Flow）。UMF将无数量约束的运动生成分解为单次运动先验生成阶段与多轮反应生成阶段。具体而言，UMF通过统一潜空间弥合异构运动数据集间的分布差异，实现高效统一训练。在运动先验生成方面，P-Flow基于不同噪声水平在分层分辨率上操作，有效降低计算开销。对于反应生成，S-Flow通过学习联合概率路径自适应执行反应变换与上下文重构，缓解误差累积。大量实验结果与用户研究表明，UMF作为面向文本驱动多人运动生成的通用模型具有显著优势。项目页面：https://githubhgh.github.io/umf/。

English

Generative models excel at motion synthesis for a fixed number of agents but struggle to generalize with variable agents. Based on limited, domain-specific data, existing methods employ autoregressive models to generate motion recursively, which suffer from inefficiency and error accumulation. We propose Unified Motion Flow (UMF), which consists of Pyramid Motion Flow (P-Flow) and Semi-Noise Motion Flow (S-Flow). UMF decomposes the number-free motion generation into a single-pass motion prior generation stage and multi-pass reaction generation stages. Specifically, UMF utilizes a unified latent space to bridge the distribution gap between heterogeneous motion datasets, enabling effective unified training. For motion prior generation, P-Flow operates on hierarchical resolutions conditioned on different noise levels, thereby mitigating computational overheads. For reaction generation, S-Flow learns a joint probabilistic path that adaptively performs reaction transformation and context reconstruction, alleviating error accumulation. Extensive results and user studies demonstrate UMF' s effectiveness as a generalist model for multi-person motion generation from text. Project page: https://githubhgh.github.io/umf/.

基于流匹配的统一无编号文本到动作生成

Unified Number-Free Text-to-Motion Generation Via Flow Matching

摘要

Support