Story-to-Motion: 長文テキストから無限かつ制御可能なキャラクターアニメーションを生成

要旨

ストーリーから自然な人間の動きを生成することは、アニメーション、ゲーム、映画産業の風景を一変させる可能性を秘めています。長文の記述に基づいてキャラクターがさまざまな場所に移動し、特定の動きを実行する必要がある場合、新たで挑戦的なタスク「Story-to-Motion」が生まれます。このタスクは、低レベルの制御（軌跡）と高レベルの制御（動きの意味論）の融合を要求します。これまでのキャラクター制御やテキストから動きを生成する研究は関連する側面を扱ってきましたが、包括的な解決策は未だ見つかっていません。キャラクター制御手法はテキスト記述を扱わず、テキストから動きを生成する手法は位置制約を欠き、しばしば不安定な動きを生成します。これらの制限を踏まえ、私たちは入力テキストに沿った制御可能で無限に長い動きと軌跡を生成する新しいシステムを提案します。(1) 現代の大規模言語モデルを活用し、テキスト駆動の動きスケジューラとして機能させ、長文から一連の（テキスト、位置、持続時間）のペアを抽出します。(2) 動きの意味論と軌跡制約を組み込んだテキスト駆動の動き検索スキームを開発します。(3) 遷移動画における不自然なポーズや足の滑りなどの一般的なアーティファクトに対処するプログレッシブマスクトランスフォーマーを設計します。Story-to-Motionの最初の包括的解決策としての先駆的な役割を超え、私たちのシステムは軌跡追従、時間的アクション合成、動きブレンディングという3つの異なるサブタスクで評価され、従来の最先端の動き合成手法を全面的に上回る性能を示しました。ホームページ: https://story2motion.github.io/。

English

Generating natural human motion from a story has the potential to transform the landscape of animation, gaming, and film industries. A new and challenging task, Story-to-Motion, arises when characters are required to move to various locations and perform specific motions based on a long text description. This task demands a fusion of low-level control (trajectories) and high-level control (motion semantics). Previous works in character control and text-to-motion have addressed related aspects, yet a comprehensive solution remains elusive: character control methods do not handle text description, whereas text-to-motion methods lack position constraints and often produce unstable motions. In light of these limitations, we propose a novel system that generates controllable, infinitely long motions and trajectories aligned with the input text. (1) We leverage contemporary Large Language Models to act as a text-driven motion scheduler to extract a series of (text, position, duration) pairs from long text. (2) We develop a text-driven motion retrieval scheme that incorporates motion matching with motion semantic and trajectory constraints. (3) We design a progressive mask transformer that addresses common artifacts in the transition motion such as unnatural pose and foot sliding. Beyond its pioneering role as the first comprehensive solution for Story-to-Motion, our system undergoes evaluation across three distinct sub-tasks: trajectory following, temporal action composition, and motion blending, where it outperforms previous state-of-the-art motion synthesis methods across the board. Homepage: https://story2motion.github.io/.

Story-to-Motion: 長文テキストから無限かつ制御可能なキャラクターアニメーションを生成

Story-to-Motion: Synthesizing Infinite and Controllable Character Animation from Long Text

要旨

Support