ChatPaper.aiChatPaper

Fairy:快速并行指令引导的视频到视频合成

Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis

December 20, 2023
作者: Bichen Wu, Ching-Yao Chuang, Xiaoyan Wang, Yichen Jia, Kapil Krishnakumar, Tong Xiao, Feng Liang, Licheng Yu, Peter Vajda
cs.AI

摘要

本文介绍了Fairy,这是一种极简但强大的图像编辑扩散模型改进,专为视频编辑应用而设计。我们的方法围绕基于锚点的跨帧注意力概念展开,这一机制在帧间隐式传播扩散特征,确保了优越的时间连贯性和高保真合成。Fairy不仅解决了先前模型的局限,包括内存和处理速度。它还通过独特的数据增强策略提高了时间一致性。该策略使模型对源图像和目标图像中的仿射变换具有等变性。令人惊叹的是,Fairy仅需14秒即可生成120帧512x384视频(以30 FPS播放的4秒时长),超过先前作品至少44倍。一项涉及1000个生成样本的全面用户研究证实,我们的方法提供了卓越质量,明显优于已建立的方法。
English
In this paper, we introduce Fairy, a minimalist yet robust adaptation of image-editing diffusion models, enhancing them for video editing applications. Our approach centers on the concept of anchor-based cross-frame attention, a mechanism that implicitly propagates diffusion features across frames, ensuring superior temporal coherence and high-fidelity synthesis. Fairy not only addresses limitations of previous models, including memory and processing speed. It also improves temporal consistency through a unique data augmentation strategy. This strategy renders the model equivariant to affine transformations in both source and target images. Remarkably efficient, Fairy generates 120-frame 512x384 videos (4-second duration at 30 FPS) in just 14 seconds, outpacing prior works by at least 44x. A comprehensive user study, involving 1000 generated samples, confirms that our approach delivers superior quality, decisively outperforming established methods.
PDF272December 15, 2024