ChatPaper.aiChatPaper

StoryDiffusion:一致性自注意力用於長距離圖像和視頻生成

StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

May 2, 2024
作者: Yupeng Zhou, Daquan Zhou, Ming-Ming Cheng, Jiashi Feng, Qibin Hou
cs.AI

摘要

對於最近基於擴散的生成模型,保持一系列生成的圖像中的內容一致性,特別是那些包含主題和複雜細節的圖像,是一個重大挑戰。在本文中,我們提出了一種新的自注意力計算方式,稱為一致性自注意力,顯著提高了生成圖像之間的一致性,並以零樣本方式增強了流行的預訓練基於擴散的文本到圖像模型。為了將我們的方法擴展到長範圍視頻生成,我們進一步引入了一個新的語義空間時間運動預測模塊,名為語義運動預測器。它被訓練來估計在語義空間中兩個提供的圖像之間的運動條件。該模塊將生成的圖像序列轉換為具有平滑過渡和一致主題的視頻,比僅基於潛在空間的模塊在長視頻生成情況下更穩定。通過將這兩個新穎組件合併,我們的框架,稱為StoryDiffusion,可以描述一個基於文本的故事,其中包含豐富多樣的內容的一致圖像或視頻。所提出的StoryDiffusion包含了在視覺故事生成中呈現圖像和視頻的開拓性探索,我們希望這將激發更多從性能方面的研究。我們的代碼已公開在https://github.com/HVision-NKU/StoryDiffusion。
English
For recent diffusion-based generative models, maintaining consistent content across a series of generated images, especially those containing subjects and complex details, presents a significant challenge. In this paper, we propose a new way of self-attention calculation, termed Consistent Self-Attention, that significantly boosts the consistency between the generated images and augments prevalent pretrained diffusion-based text-to-image models in a zero-shot manner. To extend our method to long-range video generation, we further introduce a novel semantic space temporal motion prediction module, named Semantic Motion Predictor. It is trained to estimate the motion conditions between two provided images in the semantic spaces. This module converts the generated sequence of images into videos with smooth transitions and consistent subjects that are significantly more stable than the modules based on latent spaces only, especially in the context of long video generation. By merging these two novel components, our framework, referred to as StoryDiffusion, can describe a text-based story with consistent images or videos encompassing a rich variety of contents. The proposed StoryDiffusion encompasses pioneering explorations in visual story generation with the presentation of images and videos, which we hope could inspire more research from the aspect of architectural modifications. Our code is made publicly available at https://github.com/HVision-NKU/StoryDiffusion.

Summary

AI-Generated Summary

PDF573December 15, 2024