ChatPaper.aiChatPaper

因果运动扩散模型在自回归运动生成中的应用

Causal Motion Diffusion Models for Autoregressive Motion Generation

February 26, 2026
作者: Qing Yu, Akihisa Watanabe, Kent Fujiwara
cs.AI

摘要

近期运动扩散模型的进展显著提升了人体运动合成的真实感。然而,现有方法要么依赖具有双向生成能力的全序列扩散模型(这会限制时间因果性与实时应用性),要么采用存在不稳定性和累积误差问题的自回归模型。本研究提出因果运动扩散模型(CMDM),这是一个基于因果扩散变换器的统一框架,可在语义对齐的潜空间内实现自回归运动生成。CMDM建立在运动-语言对齐因果变分自编码器(MAC-VAE)基础上,该编码器可将运动序列转换为具有时间因果性的潜表征。在此潜表征之上,通过因果扩散强制训练自回归扩散变换器,实现对运动帧的时间有序去噪。为达成快速推理,我们引入了具有因果不确定性的逐帧采样策略,即基于部分去噪的先前帧预测后续帧。该框架支持高质量文本驱动运动生成、流式合成以及交互速率下的长序列运动生成。在HumanML3D和SnapMoGen数据集上的实验表明,CMDM在语义保真度和时间平滑性上均优于现有扩散模型与自回归模型,同时显著降低了推理延迟。
English
Recent advances in motion diffusion models have substantially improved the realism of human motion synthesis. However, existing approaches either rely on full-sequence diffusion models with bidirectional generation, which limits temporal causality and real-time applicability, or autoregressive models that suffer from instability and cumulative errors. In this work, we present Causal Motion Diffusion Models (CMDM), a unified framework for autoregressive motion generation based on a causal diffusion transformer that operates in a semantically aligned latent space. CMDM builds upon a Motion-Language-Aligned Causal VAE (MAC-VAE), which encodes motion sequences into temporally causal latent representations. On top of this latent representation, an autoregressive diffusion transformer is trained using causal diffusion forcing to perform temporally ordered denoising across motion frames. To achieve fast inference, we introduce a frame-wise sampling schedule with causal uncertainty, where each subsequent frame is predicted from partially denoised previous frames. The resulting framework supports high-quality text-to-motion generation, streaming synthesis, and long-horizon motion generation at interactive rates. Experiments on HumanML3D and SnapMoGen demonstrate that CMDM outperforms existing diffusion and autoregressive models in both semantic fidelity and temporal smoothness, while substantially reducing inference latency.
PDF52February 28, 2026