リーマン運動生成：リーマンフローマッチングによる人間の運動表現と生成の統一的枠組み

要旨

ヒューマンモーション生成は、有効なモーションが構造化された非ユークリッド幾何学に従うにもかかわらず、ユークリッド空間で学習されることが多い。本論文では、モーションを積多様体上で表現し、リーマン流れマッチングを介してダイナミクスを学習する統一フレームワークであるRiemannian Motion Generation（RMG）を提案する。RMGはモーションを複数の多様体因子に分解し、固有の正規化を伴うスケールフリーな表現を実現するとともに、測地線補間、接空間での監督、多様体を保存するODE積分を学習とサンプリングに用いる。HumanML3Dにおいて、RMGはHumanML3Dフォーマットで最先端のFID（0.043）を達成し、MotionStreamerフォーマットにおける全ての報告済み指標で首位を占める。MotionMillionにおいても、強力なベースライン（FID 5.6, R@1 0.86）を上回る。アブレーションスタディにより、コンパクトなT+R（並進＋回転）表現が最も安定かつ効果的であることが示され、幾何学を考慮したモデリングが高精度なモーション生成への実用的かつスケーラブルな道筋であることが明らかとなった。

English

Human motion generation is often learned in Euclidean spaces, although valid motions follow structured non-Euclidean geometry. We present Riemannian Motion Generation (RMG), a unified framework that represents motion on a product manifold and learns dynamics via Riemannian flow matching. RMG factorizes motion into several manifold factors, yielding a scale-free representation with intrinsic normalization, and uses geodesic interpolation, tangent-space supervision, and manifold-preserving ODE integration for training and sampling. On HumanML3D, RMG achieves state-of-the-art FID in the HumanML3D format (0.043) and ranks first on all reported metrics under the MotionStreamer format. On MotionMillion, it also surpasses strong baselines (FID 5.6, R@1 0.86). Ablations show that the compact T+R (translation + rotations) representation is the most stable and effective, highlighting geometry-aware modeling as a practical and scalable route to high-fidelity motion generation.

リーマン運動生成：リーマンフローマッチングによる人間の運動表現と生成の統一的枠組み

Riemannian Motion Generation: A Unified Framework for Human Motion Representation and Generation via Riemannian Flow Matching

要旨

Support