絕對座標使運動生成變得簡單
Absolute Coordinates Make Motion Generation Easy
May 26, 2025
作者: Zichong Meng, Zeyu Han, Xiaogang Peng, Yiming Xie, Huaizu Jiang
cs.AI
摘要
現今最先進的文本到動作生成模型依賴於由HumanML3D普及的運動學感知、局部相對運動表示法,該方法通過相對於骨盆及前一幀的編碼來表示動作,並內置了冗餘性。雖然這一設計簡化了早期生成模型的訓練過程,但卻為擴散模型引入了關鍵限制,並阻礙了其在下游任務中的應用。在本研究中,我們重新審視了動作表示方式,並提出了一種極簡化且長期被棄用的替代方案:全局空間中的絕對關節座標。通過對設計選擇的系統性分析,我們展示了這一表述方式即使在僅使用簡單的Transformer架構且無需輔助運動學感知損失的情況下,也能實現顯著更高的動作保真度、改善的文本對齊能力以及強大的可擴展性。此外,我們的表述方式自然支持下游任務,如基於文本的動作控制及時間/空間編輯,而無需額外的任務特定重新工程設計或從控制信號生成昂貴的分類器指導。最後,我們展示了直接從文本生成SMPL-H網格頂點動作的潛在泛化能力,為未來的研究及動作相關應用奠定了堅實基礎。
English
State-of-the-art text-to-motion generation models rely on the
kinematic-aware, local-relative motion representation popularized by HumanML3D,
which encodes motion relative to the pelvis and to the previous frame with
built-in redundancy. While this design simplifies training for earlier
generation models, it introduces critical limitations for diffusion models and
hinders applicability to downstream tasks. In this work, we revisit the motion
representation and propose a radically simplified and long-abandoned
alternative for text-to-motion generation: absolute joint coordinates in global
space. Through systematic analysis of design choices, we show that this
formulation achieves significantly higher motion fidelity, improved text
alignment, and strong scalability, even with a simple Transformer backbone and
no auxiliary kinematic-aware losses. Moreover, our formulation naturally
supports downstream tasks such as text-driven motion control and
temporal/spatial editing without additional task-specific reengineering and
costly classifier guidance generation from control signals. Finally, we
demonstrate promising generalization to directly generate SMPL-H mesh vertices
in motion from text, laying a strong foundation for future research and
motion-related applications.Summary
AI-Generated Summary