MotionLab: モーション条件モーションパラダイムを介した統一された人間の動作生成と編集

要旨

人間の動作生成と編集は、コンピュータグラフィックスとビジョンの重要な要素です。しかしながら、この分野の現在のアプローチは、特定のタスクに合わせた孤立した解決策を提供する傾向があり、現実世界の応用には非効率で実用的ではありません。動作関連のタスクを統一しようとする取り組みもありますが、これらの方法は単に異なるモダリティを条件として動作生成を誘導するだけであり、編集機能や細かい制御を欠き、さらにタスク間での知識共有を促進することができません。これらの制限に対処し、人間の動作生成と編集の両方を処理できる汎用的で統一されたフレームワークを提供するために、我々は新しいパラダイムを導入します：Motion-Condition-Motion。このパラダイムに基づいて、ソース動作、条件、およびターゲット動作という3つの概念を用いて、さまざまなタスクの統一された定式化を可能にします。このパラダイムに基づいて、ソース動作からターゲット動作へのマッピングを学習するための指定された条件によって誘導される修正されたフローを組み込んだ統一されたフレームワークであるMotionLabを提案します。MotionLabでは、1) タスク固有のモジュールを使用せずに条件付き生成と編集を向上させるためのMotionFlow Transformer、2) ソース動作とターゲット動作の時間同期を保証するためのAligned Rotational Position Encoding、3) タスク固有の指示モジュレーション、および4) 効果的なマルチタスク学習とタスク間の知識共有のためのMotion Curriculum Learningを導入しています。特筆すべきは、我々のMotionLabは、人間の動作に関する複数のベンチマークで有望な汎化能力と推論効率を示しています。我々のコードと追加のビデオ結果は、以下で入手可能です：https://diouo.github.io/motionlab.github.io/。

English

Human motion generation and editing are key components of computer graphics and vision. However, current approaches in this field tend to offer isolated solutions tailored to specific tasks, which can be inefficient and impractical for real-world applications. While some efforts have aimed to unify motion-related tasks, these methods simply use different modalities as conditions to guide motion generation. Consequently, they lack editing capabilities, fine-grained control, and fail to facilitate knowledge sharing across tasks. To address these limitations and provide a versatile, unified framework capable of handling both human motion generation and editing, we introduce a novel paradigm: Motion-Condition-Motion, which enables the unified formulation of diverse tasks with three concepts: source motion, condition, and target motion. Based on this paradigm, we propose a unified framework, MotionLab, which incorporates rectified flows to learn the mapping from source motion to target motion, guided by the specified conditions. In MotionLab, we introduce the 1) MotionFlow Transformer to enhance conditional generation and editing without task-specific modules; 2) Aligned Rotational Position Encoding} to guarantee the time synchronization between source motion and target motion; 3) Task Specified Instruction Modulation; and 4) Motion Curriculum Learning for effective multi-task learning and knowledge sharing across tasks. Notably, our MotionLab demonstrates promising generalization capabilities and inference efficiency across multiple benchmarks for human motion. Our code and additional video results are available at: https://diouo.github.io/motionlab.github.io/.

MotionLab: モーション条件モーションパラダイムを介した統一された人間の動作生成と編集

MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm

要旨

Support