DreamVideo：カスタマイズされた主題と動きで夢の動画を構成する

要旨

拡散モデルを用いたカスタマイズ生成は、画像生成において目覚ましい進歩を遂げていますが、被写体と動きの両方を制御する必要があるという課題から、ビデオ生成タスクではまだ満足のいく結果が得られていません。この課題に対処するため、我々はDreamVideoを提案します。これは、目的の被写体の数枚の静止画と目標の動きの数本のビデオから、パーソナライズされたビデオを生成する新しいアプローチです。DreamVideoは、このタスクを被写体学習と動き学習の2段階に分離し、事前学習済みのビデオ拡散モデルを活用します。被写体学習では、提供された画像から被写体の細かい外観を正確に捉えることを目指し、テキスト反転と我々が設計したIDアダプタの微調整を組み合わせることでこれを実現します。動き学習では、動きアダプタを設計し、与えられたビデオに基づいて微調整を行うことで、目標の動きパターンを効果的にモデル化します。これら2つの軽量で効率的なアダプタを組み合わせることで、任意の被写体と任意の動きを柔軟にカスタマイズすることが可能になります。広範な実験結果は、我々のDreamVideoがカスタマイズビデオ生成において最先端の手法を凌駕する優れた性能を発揮することを示しています。プロジェクトページはhttps://dreamvideo-t2v.github.ioにあります。

English

Customized generation using diffusion models has made impressive progress in image generation, but remains unsatisfactory in the challenging video generation task, as it requires the controllability of both subjects and motions. To that end, we present DreamVideo, a novel approach to generating personalized videos from a few static images of the desired subject and a few videos of target motion. DreamVideo decouples this task into two stages, subject learning and motion learning, by leveraging a pre-trained video diffusion model. The subject learning aims to accurately capture the fine appearance of the subject from provided images, which is achieved by combining textual inversion and fine-tuning of our carefully designed identity adapter. In motion learning, we architect a motion adapter and fine-tune it on the given videos to effectively model the target motion pattern. Combining these two lightweight and efficient adapters allows for flexible customization of any subject with any motion. Extensive experimental results demonstrate the superior performance of our DreamVideo over the state-of-the-art methods for customized video generation. Our project page is at https://dreamvideo-t2v.github.io.

DreamVideo：カスタマイズされた主題と動きで夢の動画を構成する

DreamVideo: Composing Your Dream Videos with Customized Subject and Motion

要旨

Support