MotionStreamer：基于因果潜在空间中扩散自回归模型的流式运动生成

摘要

本文探讨了文本条件流式运动生成这一挑战，该任务要求我们基于可变长度的历史动作和输入的文本预测下一步的人体姿态。现有方法在实现流式运动生成方面存在困难，例如，扩散模型受限于预定义的运动长度，而基于GPT的方法则因离散化的非因果标记化导致响应延迟和误差累积问题。为解决这些问题，我们提出了MotionStreamer，一个创新框架，它将连续的因果潜在空间整合到概率自回归模型中。连续的潜在变量缓解了离散化造成的信息损失，并有效减少了长期自回归生成过程中的误差累积。此外，通过在当前与历史运动潜在变量间建立时间因果依赖关系，我们的模型充分利用可用信息，实现了精确的在线运动解码。实验表明，我们的方法超越了现有技术，同时提供了更多应用场景，包括多轮生成、长期生成及动态运动组合。项目页面：https://zju3dv.github.io/MotionStreamer/

English

This paper addresses the challenge of text-conditioned streaming motion generation, which requires us to predict the next-step human pose based on variable-length historical motions and incoming texts. Existing methods struggle to achieve streaming motion generation, e.g., diffusion models are constrained by pre-defined motion lengths, while GPT-based methods suffer from delayed response and error accumulation problem due to discretized non-causal tokenization. To solve these problems, we propose MotionStreamer, a novel framework that incorporates a continuous causal latent space into a probabilistic autoregressive model. The continuous latents mitigate information loss caused by discretization and effectively reduce error accumulation during long-term autoregressive generation. In addition, by establishing temporal causal dependencies between current and historical motion latents, our model fully utilizes the available information to achieve accurate online motion decoding. Experiments show that our method outperforms existing approaches while offering more applications, including multi-round generation, long-term generation, and dynamic motion composition. Project Page: https://zju3dv.github.io/MotionStreamer/

MotionStreamer：基于因果潜在空间中扩散自回归模型的流式运动生成

MotionStreamer: Streaming Motion Generation via Diffusion-based Autoregressive Model in Causal Latent Space

摘要

Support