ChatPaper.aiChatPaper

Motion Mamba:具有層次結構和雙向選擇性SSM的高效長序列動作生成

Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM

March 12, 2024
作者: Zeyu Zhang, Akide Liu, Ian Reid, Richard Hartley, Bohan Zhuang, Hao Tang
cs.AI

摘要

在生成式電腦視覺中,人類動作生成被視為一個重要的追求,然而實現長序列和高效的動作生成仍然具有挑戰性。最近在狀態空間模型(SSMs)方面的進展,特別是Mamba,展示了在長序列建模方面具有顯著潛力,並具有高效的硬體感知設計,這似乎是構建動作生成模型的一個有前途的方向。然而,將SSMs適應到動作生成面臨困難,因為缺乏一個專門設計的架構來建模動作序列。為了應對這些挑戰,我們提出了Motion Mamba,這是一種簡單而高效的方法,提出了首創的動作生成模型,利用了SSMs。具體來說,我們設計了一個分層時間Mamba(HTM)塊,通過在對稱U-Net架構中組合不同數量的獨立SSM模塊來處理時間數據,旨在保持幀之間的動作一致性。我們還設計了一個雙向空間Mamba(BSM)塊,以雙向處理潛在姿勢,以增強在時間幀內的準確動作生成。我們提出的方法在HumanML3D和KIT-ML數據集上相比於先前最佳的基於擴散的方法,實現了高達50%的FID改進和高達4倍的速度提升,展示了高質量長序列動作建模和實時人類動作生成的強大能力。請參閱項目網站https://steve-zeyu-zhang.github.io/MotionMamba/
English
Human motion generation stands as a significant pursuit in generative computer vision, while achieving long-sequence and efficient motion generation remains challenging. Recent advancements in state space models (SSMs), notably Mamba, have showcased considerable promise in long sequence modeling with an efficient hardware-aware design, which appears to be a promising direction to build motion generation model upon it. Nevertheless, adapting SSMs to motion generation faces hurdles since the lack of a specialized design architecture to model motion sequence. To address these challenges, we propose Motion Mamba, a simple and efficient approach that presents the pioneering motion generation model utilized SSMs. Specifically, we design a Hierarchical Temporal Mamba (HTM) block to process temporal data by ensemble varying numbers of isolated SSM modules across a symmetric U-Net architecture aimed at preserving motion consistency between frames. We also design a Bidirectional Spatial Mamba (BSM) block to bidirectionally process latent poses, to enhance accurate motion generation within a temporal frame. Our proposed method achieves up to 50% FID improvement and up to 4 times faster on the HumanML3D and KIT-ML datasets compared to the previous best diffusion-based method, which demonstrates strong capabilities of high-quality long sequence motion modeling and real-time human motion generation. See project website https://steve-zeyu-zhang.github.io/MotionMamba/

Summary

AI-Generated Summary

PDF174December 15, 2024