KMM：用于扩展运动生成的关键帧掩码曼巴

摘要

人类动作生成是生成式计算机视觉研究的前沿领域，具有在视频创作、游戏开发和机器人操作等方面的应用前景。最近的曼巴架构在高效建模长且复杂序列方面表现出有希望的结果，但仍存在两个重要挑战：首先，直接将曼巴应用于扩展动作生成是无效的，因为隐式记忆的有限容量导致记忆衰减。其次，与变压器相比，曼巴在多模态融合方面存在困难，并且缺乏与文本查询的对齐，经常混淆方向（左或右）或省略较长文本查询的部分。为了解决这些挑战，本文提出了三个关键贡献：首先，我们引入了KMM，这是一种新颖的架构，具有关键帧遮罩建模，旨在增强曼巴对动作片段中关键动作的关注。这种方法解决了记忆衰减问题，并代表了在SSM中定制战略帧级遮罩的开创性方法。此外，我们设计了一种对比学习范式，以解决曼巴中的多模态融合问题，并改善动作-文本对齐。最后，我们在常用数据集BABEL上进行了大量实验，实现了与先前最先进方法相比，FID减少超过57％，参数减少70％的最新性能。请查看项目网站：https://steve-zeyu-zhang.github.io/KMM

English

Human motion generation is a cut-edge area of research in generative computer vision, with promising applications in video creation, game development, and robotic manipulation. The recent Mamba architecture shows promising results in efficiently modeling long and complex sequences, yet two significant challenges remain: Firstly, directly applying Mamba to extended motion generation is ineffective, as the limited capacity of the implicit memory leads to memory decay. Secondly, Mamba struggles with multimodal fusion compared to Transformers, and lack alignment with textual queries, often confusing directions (left or right) or omitting parts of longer text queries. To address these challenges, our paper presents three key contributions: Firstly, we introduce KMM, a novel architecture featuring Key frame Masking Modeling, designed to enhance Mamba's focus on key actions in motion segments. This approach addresses the memory decay problem and represents a pioneering method in customizing strategic frame-level masking in SSMs. Additionally, we designed a contrastive learning paradigm for addressing the multimodal fusion problem in Mamba and improving the motion-text alignment. Finally, we conducted extensive experiments on the go-to dataset, BABEL, achieving state-of-the-art performance with a reduction of more than 57% in FID and 70% parameters compared to previous state-of-the-art methods. See project website: https://steve-zeyu-zhang.github.io/KMM

KMM：用于扩展运动生成的关键帧掩码曼巴

KMM: Key Frame Mask Mamba for Extended Motion Generation

摘要

Support