DreamVideo: 맞춤형 주제와 동작으로 꿈의 비디오 구성하기

초록

디퓨전 모델을 활용한 맞춤형 생성은 이미지 생성 분야에서 인상적인 진전을 보였지만, 주제와 동작 모두에 대한 제어가 요구되는 까다로운 비디오 생성 작업에서는 여전히 만족스럽지 못한 상황입니다. 이를 위해 우리는 원하는 주체의 정적 이미지 몇 장과 목표 동작의 비디오 몇 개로부터 개인화된 비디오를 생성하는 새로운 접근법인 DreamVideo를 제안합니다. DreamVideo는 사전 훈련된 비디오 디퓨전 모델을 활용하여 이 작업을 주체 학습과 동작 학습 두 단계로 분리합니다. 주체 학습은 제공된 이미지로부터 주체의 세밀한 외관을 정확히 포착하는 것을 목표로 하며, 이는 텍스트 인버전과 우리가 신중하게 설계한 아이덴티티 어댑터의 미세 조정을 결합하여 달성됩니다. 동작 학습에서는 주어진 비디오에 대해 미세 조정된 모션 어댑터를 설계하여 목표 동작 패턴을 효과적으로 모델링합니다. 이 두 가지 가볍고 효율적인 어댑터를 결합함으로써 어떤 주체라도 어떤 동작으로도 유연하게 맞춤 설정할 수 있습니다. 광범위한 실험 결과는 맞춤형 비디오 생성에 있어 우리의 DreamVideo가 최신 방법들을 능가하는 우수한 성능을 보여줍니다. 우리의 프로젝트 페이지는 https://dreamvideo-t2v.github.io에서 확인할 수 있습니다.

English

Customized generation using diffusion models has made impressive progress in image generation, but remains unsatisfactory in the challenging video generation task, as it requires the controllability of both subjects and motions. To that end, we present DreamVideo, a novel approach to generating personalized videos from a few static images of the desired subject and a few videos of target motion. DreamVideo decouples this task into two stages, subject learning and motion learning, by leveraging a pre-trained video diffusion model. The subject learning aims to accurately capture the fine appearance of the subject from provided images, which is achieved by combining textual inversion and fine-tuning of our carefully designed identity adapter. In motion learning, we architect a motion adapter and fine-tune it on the given videos to effectively model the target motion pattern. Combining these two lightweight and efficient adapters allows for flexible customization of any subject with any motion. Extensive experimental results demonstrate the superior performance of our DreamVideo over the state-of-the-art methods for customized video generation. Our project page is at https://dreamvideo-t2v.github.io.

DreamVideo: 맞춤형 주제와 동작으로 꿈의 비디오 구성하기

DreamVideo: Composing Your Dream Videos with Customized Subject and Motion

초록

Support