AnimateDiff：動畫您的個性化文本到圖像擴散模型，無需特定調整

摘要

隨著文本轉圖像模型（例如Stable Diffusion）和相應的個性化技術，如DreamBooth和LoRA的進步，每個人都可以以負擔得起的成本將他們的想像力具現化為高質量圖像。因此，對於進一步將生成的靜態圖像與運動動態相結合的圖像動畫技術有著巨大需求。在本報告中，我們提出了一個實用框架，可以一勞永逸地為大多數現有的個性化文本轉圖像模型添加動畫，從而節省了對特定模型的調整工作。所提出框架的核心是將一個新初始化的運動建模模塊插入凍結的文本轉圖像模型中，並將其訓練在視頻剪輯上以提煉合理的運動先驗。一旦訓練完成，通過簡單地注入這個運動建模模塊，所有從相同基本T2I衍生的個性化版本都會立即成為由文本驅動的模型，可以生成多樣且個性化的動畫圖像。我們在幾個公共代表性的個性化文本轉圖像模型上進行了評估，包括動漫圖片和寫實照片，並展示了我們提出的框架有助於這些模型生成在保留輸出的領域和多樣性的同時具有時間上平滑的動畫片段。代碼和預先訓練的權重將在https://animatediff.github.io/ 上公開提供。

English

With the advance of text-to-image models (e.g., Stable Diffusion) and corresponding personalization techniques such as DreamBooth and LoRA, everyone can manifest their imagination into high-quality images at an affordable cost. Subsequently, there is a great demand for image animation techniques to further combine generated static images with motion dynamics. In this report, we propose a practical framework to animate most of the existing personalized text-to-image models once and for all, saving efforts in model-specific tuning. At the core of the proposed framework is to insert a newly initialized motion modeling module into the frozen text-to-image model and train it on video clips to distill reasonable motion priors. Once trained, by simply injecting this motion modeling module, all personalized versions derived from the same base T2I readily become text-driven models that produce diverse and personalized animated images. We conduct our evaluation on several public representative personalized text-to-image models across anime pictures and realistic photographs, and demonstrate that our proposed framework helps these models generate temporally smooth animation clips while preserving the domain and diversity of their outputs. Code and pre-trained weights will be publicly available at https://animatediff.github.io/ .

AnimateDiff：動畫您的個性化文本到圖像擴散模型，無需特定調整

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

摘要

Support