AnyMoLe: ビデオ拡散モデルを活用した任意のキャラクターモーション補間

要旨

学習ベースのモーションインビートウィーニングにおける最近の進展にもかかわらず、重要な制限が見落とされてきました：キャラクター固有のデータセットが必要であることです。本研究では、この制限を解決する新しい手法AnyMoLeを紹介します。AnyMoLeは、外部データなしで任意のキャラクターのモーションインビートウィーンフレームを生成するためにビデオ拡散モデルを活用します。私たちのアプローチは、文脈理解を強化するための2段階のフレーム生成プロセスを採用しています。さらに、実世界とレンダリングされたキャラクターアニメーションの間のドメインギャップを埋めるために、ビデオ拡散モデルのためのファインチューニング技術であるICAdaptを導入します。加えて、「モーションビデオ模倣」最適化技術を提案し、2Dおよび3D対応の特徴を使用して任意の関節構造を持つキャラクターのシームレスなモーション生成を可能にします。AnyMoLeは、データ依存性を大幅に削減しながら、滑らかでリアルな遷移を生成し、幅広いモーションインビートウィーニングタスクに適用可能です。

English

Despite recent advancements in learning-based motion in-betweening, a key limitation has been overlooked: the requirement for character-specific datasets. In this work, we introduce AnyMoLe, a novel method that addresses this limitation by leveraging video diffusion models to generate motion in-between frames for arbitrary characters without external data. Our approach employs a two-stage frame generation process to enhance contextual understanding. Furthermore, to bridge the domain gap between real-world and rendered character animations, we introduce ICAdapt, a fine-tuning technique for video diffusion models. Additionally, we propose a ``motion-video mimicking'' optimization technique, enabling seamless motion generation for characters with arbitrary joint structures using 2D and 3D-aware features. AnyMoLe significantly reduces data dependency while generating smooth and realistic transitions, making it applicable to a wide range of motion in-betweening tasks.