ChatPaper.aiChatPaper

Motion Mamba:具有分层和双向选择性SSM的高效长序列运动生成

Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM

March 12, 2024
作者: Zeyu Zhang, Akide Liu, Ian Reid, Richard Hartley, Bohan Zhuang, Hao Tang
cs.AI

摘要

在生成计算机视觉中,人类动作生成是一个重要的研究领域,而实现长序列和高效的动作生成仍然具有挑战性。最近在状态空间模型(SSMs)方面的进展,尤其是Mamba,展示了在长序列建模方面具有很大潜力的硬件感知设计,这似乎是构建动作生成模型的一个有前途的方向。然而,将SSMs调整到动作生成面临困难,因为缺乏专门设计的架构来建模动作序列。为了解决这些挑战,我们提出了Motion Mamba,这是一种简单高效的方法,提出了利用SSMs的开创性动作生成模型。具体来说,我们设计了一个层次时间Mamba(HTM)块,通过在对称U-Net架构中集成不同数量的孤立SSM模块来处理时间数据,旨在保持帧间动作一致性。我们还设计了一个双向空间Mamba(BSM)块,用于双向处理潜在姿势,以增强在时间帧内的准确动作生成。我们提出的方法在HumanML3D和KIT-ML数据集上相较于先前最佳的基于扩散的方法,实现了高达50%的FID改进和高达4倍的速度提升,展示了高质量长序列动作建模和实时人类动作生成的强大能力。请参阅项目网站 https://steve-zeyu-zhang.github.io/MotionMamba/
English
Human motion generation stands as a significant pursuit in generative computer vision, while achieving long-sequence and efficient motion generation remains challenging. Recent advancements in state space models (SSMs), notably Mamba, have showcased considerable promise in long sequence modeling with an efficient hardware-aware design, which appears to be a promising direction to build motion generation model upon it. Nevertheless, adapting SSMs to motion generation faces hurdles since the lack of a specialized design architecture to model motion sequence. To address these challenges, we propose Motion Mamba, a simple and efficient approach that presents the pioneering motion generation model utilized SSMs. Specifically, we design a Hierarchical Temporal Mamba (HTM) block to process temporal data by ensemble varying numbers of isolated SSM modules across a symmetric U-Net architecture aimed at preserving motion consistency between frames. We also design a Bidirectional Spatial Mamba (BSM) block to bidirectionally process latent poses, to enhance accurate motion generation within a temporal frame. Our proposed method achieves up to 50% FID improvement and up to 4 times faster on the HumanML3D and KIT-ML datasets compared to the previous best diffusion-based method, which demonstrates strong capabilities of high-quality long sequence motion modeling and real-time human motion generation. See project website https://steve-zeyu-zhang.github.io/MotionMamba/

Summary

AI-Generated Summary

PDF174December 15, 2024