ChatPaper.aiChatPaper

萬物動態:任意生成運動

Motion Anything: Any to Motion Generation

March 10, 2025
作者: Zeyu Zhang, Yiran Wang, Wei Mao, Danning Li, Rui Zhao, Biao Wu, Zirui Song, Bohan Zhuang, Ian Reid, Richard Hartley
cs.AI

摘要

條件式動作生成在電腦視覺領域已被廣泛研究,然而仍存在兩大關鍵挑戰。首先,儘管遮罩自回歸方法近期表現優於基於擴散的技術,現有的遮罩模型缺乏根據給定條件來優先處理動態幀和身體部位的機制。其次,現有針對不同條件模式的方法往往無法有效整合多種模式,限制了生成動作的控制性和連貫性。為解決這些挑戰,我們提出了Motion Anything,這是一個多模態動作生成框架,引入了基於注意力的遮罩建模方法,實現了對關鍵幀和動作的精細化時空控制。我們的模型能自適應地編碼包括文本和音樂在內的多模態條件,提升了可控性。此外,我們還推出了Text-Music-Dance (TMD),這是一個包含2,153組文本、音樂和舞蹈配對的新動作數據集,其規模是AIST++的兩倍,填補了該領域的重要空白。大量實驗表明,Motion Anything在多個基準測試中超越了現有最先進的方法,在HumanML3D上FID提升了15%,並在AIST++和TMD上展現了持續的性能提升。詳情請訪問我們的項目網站:https://steve-zeyu-zhang.github.io/MotionAnything。
English
Conditional motion generation has been extensively studied in computer vision, yet two critical challenges remain. First, while masked autoregressive methods have recently outperformed diffusion-based approaches, existing masking models lack a mechanism to prioritize dynamic frames and body parts based on given conditions. Second, existing methods for different conditioning modalities often fail to integrate multiple modalities effectively, limiting control and coherence in generated motion. To address these challenges, we propose Motion Anything, a multimodal motion generation framework that introduces an Attention-based Mask Modeling approach, enabling fine-grained spatial and temporal control over key frames and actions. Our model adaptively encodes multimodal conditions, including text and music, improving controllability. Additionally, we introduce Text-Music-Dance (TMD), a new motion dataset consisting of 2,153 pairs of text, music, and dance, making it twice the size of AIST++, thereby filling a critical gap in the community. Extensive experiments demonstrate that Motion Anything surpasses state-of-the-art methods across multiple benchmarks, achieving a 15% improvement in FID on HumanML3D and showing consistent performance gains on AIST++ and TMD. See our project website https://steve-zeyu-zhang.github.io/MotionAnything

Summary

AI-Generated Summary

PDF326March 13, 2025