ChatPaper.aiChatPaper

Fairy:快速並行指令導向的影像合成

Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis

December 20, 2023
作者: Bichen Wu, Ching-Yao Chuang, Xiaoyan Wang, Yichen Jia, Kapil Krishnakumar, Tong Xiao, Feng Liang, Licheng Yu, Peter Vajda
cs.AI

摘要

本文介紹了Fairy,這是一種極簡但堅固的影像編輯擴散模型改進版,專為視頻編輯應用而設計。我們的方法著重於錨點式跨幀關注的概念,這是一種隱式地在幀之間傳播擴散特徵的機制,確保了卓越的時間一致性和高保真度的合成。Fairy不僅解決了先前模型的限制,包括內存和處理速度,還通過獨特的數據擴增策略改進了時間一致性。該策略使模型對源圖像和目標圖像中的仿射變換具有等變性。令人驚訝的是,Fairy僅需14秒即可生成120幀512x384視頻(30 FPS下的4秒持續時間),速度比先前的作品快至少44倍。一項包括1000個生成樣本的全面用戶研究證實,我們的方法提供了卓越的質量,明顯優於已建立的方法。
English
In this paper, we introduce Fairy, a minimalist yet robust adaptation of image-editing diffusion models, enhancing them for video editing applications. Our approach centers on the concept of anchor-based cross-frame attention, a mechanism that implicitly propagates diffusion features across frames, ensuring superior temporal coherence and high-fidelity synthesis. Fairy not only addresses limitations of previous models, including memory and processing speed. It also improves temporal consistency through a unique data augmentation strategy. This strategy renders the model equivariant to affine transformations in both source and target images. Remarkably efficient, Fairy generates 120-frame 512x384 videos (4-second duration at 30 FPS) in just 14 seconds, outpacing prior works by at least 44x. A comprehensive user study, involving 1000 generated samples, confirms that our approach delivers superior quality, decisively outperforming established methods.
PDF272December 15, 2024