Animate-X: 強化されたモーション表現を備えた汎用キャラクター画像アニメーション

要旨

最近、高品質なビデオをリファレンス画像とターゲットポーズシーケンスから生成するキャラクター画像アニメーションは、著しい進歩を遂げています。しかしながら、ほとんどの既存手法は人物像にのみ適用され、一般的にゲームやエンターテイメント業界で使用される人型キャラクターには適用しにくい傾向があります。当該制限の原因は、動きのモデリングが不十分であり、駆動ビデオの動きパターンを理解できず、従ってポーズシーケンスをターゲットキャラクターに硬直してしまうことにあると、私たちの詳細な分析は示唆しています。この論文では、これに対処するために、人型キャラクターを含むさまざまなキャラクタータイプ（総称してXと呼ぶ）に対応する、LDMに基づく汎用アニメーションフレームワークであるAnimate-Xを提案します。動き表現を向上させるために、駆動ビデオから包括的な動きパターンを捉えるPose Indicatorを導入します。前者は、駆動ビデオのCLIPビジュアル特徴を活用して、全体的な動きパターンや動き間の時間的関係など、その動きの要点を抽出します。一方、後者は、推論中に発生する可能性のある入力を事前にシミュレートすることで、LDMの汎化を強化します。さらに、汎用的かつ広く適用可能なアニメーション画像に対するAnimate-Xの性能を評価するために、新しいAnimated Anthropomorphic Benchmark（A^2Bench）を導入します。幅広い実験により、Animate-Xの優位性と効果が、最先端の手法と比較して示されました。

English

Character image animation, which generates high-quality videos from a reference image and target pose sequence, has seen significant progress in recent years. However, most existing methods only apply to human figures, which usually do not generalize well on anthropomorphic characters commonly used in industries like gaming and entertainment. Our in-depth analysis suggests to attribute this limitation to their insufficient modeling of motion, which is unable to comprehend the movement pattern of the driving video, thus imposing a pose sequence rigidly onto the target character. To this end, this paper proposes Animate-X, a universal animation framework based on LDM for various character types (collectively named X), including anthropomorphic characters. To enhance motion representation, we introduce the Pose Indicator, which captures comprehensive motion pattern from the driving video through both implicit and explicit manner. The former leverages CLIP visual features of a driving video to extract its gist of motion, like the overall movement pattern and temporal relations among motions, while the latter strengthens the generalization of LDM by simulating possible inputs in advance that may arise during inference. Moreover, we introduce a new Animated Anthropomorphic Benchmark (A^2Bench) to evaluate the performance of Animate-X on universal and widely applicable animation images. Extensive experiments demonstrate the superiority and effectiveness of Animate-X compared to state-of-the-art methods.

Animate-X: 強化されたモーション表現を備えた汎用キャラクター画像アニメーション

Animate-X: Universal Character Image Animation with Enhanced Motion Representation

要旨

Support