MACE-Dance：基於動作-外觀串聯專家模型的音樂驅動舞蹈影片生成

摘要

隨著線上舞蹈影片平台的興起與人工智慧生成內容（AIGC）技術的快速發展，音樂驅動的舞蹈生成已成為一個極具吸引力的研究領域。儘管在音樂驅動3D舞蹈生成、姿勢驅動圖像動畫及語音驅動頭像合成等相關領域已取得顯著進展，現有方法仍無法直接適用於此任務。此外，該領域的有限研究仍難以同時實現高品質視覺外觀與逼真人體動作。為此，我們提出MACE-Dance——一個基於級聯專家混合系統（MoE）的音樂驅動舞蹈影片生成框架。其中動作專家模組負責執行音樂至3D動作的生成，同時確保運動學合理性與藝術表現力；而外觀專家模組則進行動作與參考條件下的影片合成，保持時空連貫性的視覺特徵。具體而言，動作專家採用具備BiMamba-Transformer混合架構的擴散模型及無引導訓練（GFT）策略，在3D舞蹈生成領域達到最先進（SOTA）性能；外觀專家則運用解耦的運動學-美學微調策略，於姿勢驅動圖像動畫任務中實現最佳表現。為建立更完善的評估基準，我們構建大規模多樣化數據集並設計動作-外觀聯合評估方案。基於此方案，MACE-Dance同樣展現出最優異的綜合性能。相關程式碼已開源於：https://github.com/AMAP-ML/MACE-Dance。

English

With the rise of online dance-video platforms and rapid advances in AI-generated content (AIGC), music-driven dance generation has emerged as a compelling research direction. Despite substantial progress in related domains such as music-driven 3D dance generation, pose-driven image animation, and audio-driven talking-head synthesis, existing methods cannot be directly adapted to this task. Moreover, the limited studies in this area still struggle to jointly achieve high-quality visual appearance and realistic human motion. Accordingly, we present MACE-Dance, a music-driven dance video generation framework with cascaded Mixture-of-Experts (MoE). The Motion Expert performs music-to-3D motion generation while enforcing kinematic plausibility and artistic expressiveness, whereas the Appearance Expert carries out motion- and reference-conditioned video synthesis, preserving visual identity with spatiotemporal coherence. Specifically, the Motion Expert adopts a diffusion model with a BiMamba-Transformer hybrid architecture and a Guidance-Free Training (GFT) strategy, achieving state-of-the-art (SOTA) performance in 3D dance generation. The Appearance Expert employs a decoupled kinematic-aesthetic fine-tuning strategy, achieving state-of-the-art (SOTA) performance in pose-driven image animation. To better benchmark this task, we curate a large-scale and diverse dataset and design a motion-appearance evaluation protocol. Based on this protocol, MACE-Dance also achieves state-of-the-art performance. Code is available at https://github.com/AMAP-ML/MACE-Dance.

MACE-Dance：基於動作-外觀串聯專家模型的音樂驅動舞蹈影片生成

MACE-Dance: Motion-Appearance Cascaded Experts for Music-Driven Dance Video Generation

摘要

Support