ChatPaper.aiChatPaper

FRESCO:零樣本視頻翻譯的時空對應

FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation

March 19, 2024
作者: Shuai Yang, Yifan Zhou, Ziwei Liu, Chen Change Loy
cs.AI

摘要

文字到圖像擴散模型的卓越效能激發了人們對其在視頻領域潛在應用的廣泛探索。零樣本方法旨在將圖像擴散模型擴展到視頻,而無需進行模型訓練。最近的方法主要集中在將幀間對應納入注意機制中。然而,在確定要關注有效特徵的位置時所施加的軟約束有時可能不足,導致時間上的不一致性。在本文中,我們引入了FRESCO,將幀內對應與幀間對應結合,以建立更強大的時空約束。這種增強確保了跨幀之間語義相似內容更一致的轉換。除了僅僅的注意引導之外,我們的方法涉及對特徵的明確更新,以實現與輸入視頻高度時空一致性,顯著提高了所生成翻譯視頻的視覺一致性。大量實驗證明了我們提出的框架在生成高質量、一致性視頻方面的有效性,明顯優於現有的零樣本方法。
English
The remarkable efficacy of text-to-image diffusion models has motivated extensive exploration of their potential application in video domains. Zero-shot methods seek to extend image diffusion models to videos without necessitating model training. Recent methods mainly focus on incorporating inter-frame correspondence into attention mechanisms. However, the soft constraint imposed on determining where to attend to valid features can sometimes be insufficient, resulting in temporal inconsistency. In this paper, we introduce FRESCO, intra-frame correspondence alongside inter-frame correspondence to establish a more robust spatial-temporal constraint. This enhancement ensures a more consistent transformation of semantically similar content across frames. Beyond mere attention guidance, our approach involves an explicit update of features to achieve high spatial-temporal consistency with the input video, significantly improving the visual coherence of the resulting translated videos. Extensive experiments demonstrate the effectiveness of our proposed framework in producing high-quality, coherent videos, marking a notable improvement over existing zero-shot methods.

Summary

AI-Generated Summary

PDF81December 15, 2024