MixerMDM:可學習的人體運動擴散模型組合
MixerMDM: Learnable Composition of Human Motion Diffusion Models
April 1, 2025
作者: Pablo Ruiz-Ponce, German Barquero, Cristina Palmero, Sergio Escalera, José García-Rodríguez
cs.AI
摘要
基於文本描述等條件生成人體運動具有挑戰性,這主要是因為需要配對高質量運動及其相應條件的數據集。當追求更精細的生成控制時,難度進一步增加。為此,先前的研究提出結合多個預訓練於不同條件類型數據集上的運動擴散模型,從而實現多條件控制。然而,這些提出的合併策略忽略了結合生成過程的最佳方式可能取決於每個預訓練生成模型的特性以及特定的文本描述。在此背景下,我們引入了MixerMDM,這是首個用於結合預訓練文本條件人體運動擴散模型的學習型模型組合技術。與以往方法不同,MixerMDM提供了一種動態混合策略,該策略以對抗方式訓練,旨在學習根據驅動生成的條件集來結合每個模型的去噪過程。通過使用MixerMDM結合單人和多人運動擴散模型,我們能夠對每個人的動態以及整體互動進行細粒度控制。此外,我們提出了一種新的評估技術,首次在該任務中通過計算混合生成運動與其條件之間的對齊度來衡量互動和個體質量,並評估MixerMDM根據待混合運動在整個去噪過程中調整混合的能力。
English
Generating human motion guided by conditions such as textual descriptions is
challenging due to the need for datasets with pairs of high-quality motion and
their corresponding conditions. The difficulty increases when aiming for finer
control in the generation. To that end, prior works have proposed to combine
several motion diffusion models pre-trained on datasets with different types of
conditions, thus allowing control with multiple conditions. However, the
proposed merging strategies overlook that the optimal way to combine the
generation processes might depend on the particularities of each pre-trained
generative model and also the specific textual descriptions. In this context,
we introduce MixerMDM, the first learnable model composition technique for
combining pre-trained text-conditioned human motion diffusion models. Unlike
previous approaches, MixerMDM provides a dynamic mixing strategy that is
trained in an adversarial fashion to learn to combine the denoising process of
each model depending on the set of conditions driving the generation. By using
MixerMDM to combine single- and multi-person motion diffusion models, we
achieve fine-grained control on the dynamics of every person individually, and
also on the overall interaction. Furthermore, we propose a new evaluation
technique that, for the first time in this task, measures the interaction and
individual quality by computing the alignment between the mixed generated
motions and their conditions as well as the capabilities of MixerMDM to adapt
the mixing throughout the denoising process depending on the motions to mix.Summary
AI-Generated Summary