绝对坐标简化运动生成

摘要

当前最先进的文本到动作生成模型依赖于由HumanML3D推广的、基于运动学感知的局部相对运动表示方法，该方法通过相对于骨盆和前一帧的编码，内置了冗余信息。虽然这种设计简化了早期生成模型的训练，但它为扩散模型引入了关键限制，并阻碍了在下游任务中的应用。在本研究中，我们重新审视了运动表示方法，并提出了一种极简且长期被忽视的替代方案用于文本到动作生成：全局空间中的绝对关节坐标。通过对设计选择的系统分析，我们展示了这种表示方式即使在简单的Transformer骨干网络和无辅助运动学感知损失的情况下，也能实现显著更高的动作保真度、改进的文本对齐以及强大的可扩展性。此外，我们的表示方式自然支持下游任务，如文本驱动的动作控制和时间/空间编辑，无需额外的任务特定重构和从控制信号生成昂贵的分类器指导。最后，我们展示了直接从文本生成SMPL-H网格顶点运动的良好泛化能力，为未来研究和动作相关应用奠定了坚实的基础。

English

State-of-the-art text-to-motion generation models rely on the kinematic-aware, local-relative motion representation popularized by HumanML3D, which encodes motion relative to the pelvis and to the previous frame with built-in redundancy. While this design simplifies training for earlier generation models, it introduces critical limitations for diffusion models and hinders applicability to downstream tasks. In this work, we revisit the motion representation and propose a radically simplified and long-abandoned alternative for text-to-motion generation: absolute joint coordinates in global space. Through systematic analysis of design choices, we show that this formulation achieves significantly higher motion fidelity, improved text alignment, and strong scalability, even with a simple Transformer backbone and no auxiliary kinematic-aware losses. Moreover, our formulation naturally supports downstream tasks such as text-driven motion control and temporal/spatial editing without additional task-specific reengineering and costly classifier guidance generation from control signals. Finally, we demonstrate promising generalization to directly generate SMPL-H mesh vertices in motion from text, laying a strong foundation for future research and motion-related applications.

绝对坐标简化运动生成

Absolute Coordinates Make Motion Generation Easy

摘要

Support