MotionStreamer:基于因果潜在空间中扩散自回归模型的流式运动生成
MotionStreamer: Streaming Motion Generation via Diffusion-based Autoregressive Model in Causal Latent Space
March 19, 2025
作者: Lixing Xiao, Shunlin Lu, Huaijin Pi, Ke Fan, Liang Pan, Yueer Zhou, Ziyong Feng, Xiaowei Zhou, Sida Peng, Jingbo Wang
cs.AI
摘要
本文探讨了文本条件流式运动生成这一挑战,该任务要求我们基于可变长度的历史动作和输入的文本预测下一步的人体姿态。现有方法在实现流式运动生成方面存在困难,例如,扩散模型受限于预定义的运动长度,而基于GPT的方法则因离散化的非因果标记化导致响应延迟和误差累积问题。为解决这些问题,我们提出了MotionStreamer,一个创新框架,它将连续的因果潜在空间整合到概率自回归模型中。连续的潜在变量缓解了离散化造成的信息损失,并有效减少了长期自回归生成过程中的误差累积。此外,通过在当前与历史运动潜在变量间建立时间因果依赖关系,我们的模型充分利用可用信息,实现了精确的在线运动解码。实验表明,我们的方法超越了现有技术,同时提供了更多应用场景,包括多轮生成、长期生成及动态运动组合。项目页面:https://zju3dv.github.io/MotionStreamer/
English
This paper addresses the challenge of text-conditioned streaming motion
generation, which requires us to predict the next-step human pose based on
variable-length historical motions and incoming texts. Existing methods
struggle to achieve streaming motion generation, e.g., diffusion models are
constrained by pre-defined motion lengths, while GPT-based methods suffer from
delayed response and error accumulation problem due to discretized non-causal
tokenization. To solve these problems, we propose MotionStreamer, a novel
framework that incorporates a continuous causal latent space into a
probabilistic autoregressive model. The continuous latents mitigate information
loss caused by discretization and effectively reduce error accumulation during
long-term autoregressive generation. In addition, by establishing temporal
causal dependencies between current and historical motion latents, our model
fully utilizes the available information to achieve accurate online motion
decoding. Experiments show that our method outperforms existing approaches
while offering more applications, including multi-round generation, long-term
generation, and dynamic motion composition. Project Page:
https://zju3dv.github.io/MotionStreamer/Summary
AI-Generated Summary