ChatPaper.aiChatPaper

FRESCO:零样本视频翻译的时空对应

FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation

March 19, 2024
作者: Shuai Yang, Yifan Zhou, Ziwei Liu, Chen Change Loy
cs.AI

摘要

文本到图像扩散模型的显著有效性激发了人们对其在视频领域潜在应用的广泛探索。零样本方法旨在将图像扩散模型扩展到视频,而无需进行模型训练。最近的方法主要集中在将帧间对应性纳入注意力机制中。然而,对于确定在哪里关注有效特征的软约束有时可能不足,导致时间不一致性。在本文中,我们引入了FRESCO,即帧内对应性与帧间对应性,以建立更强大的时空约束。这种增强确保了跨帧间语义相似内容更一致的转换。除了简单的注意力指导之外,我们的方法涉及对特征的显式更新,以实现与输入视频高度一致的时空一致性,显著提高了生成的翻译视频的视觉连贯性。大量实验证明了我们提出的框架在生成高质量、连贯视频方面的有效性,明显优于现有的零样本方法。
English
The remarkable efficacy of text-to-image diffusion models has motivated extensive exploration of their potential application in video domains. Zero-shot methods seek to extend image diffusion models to videos without necessitating model training. Recent methods mainly focus on incorporating inter-frame correspondence into attention mechanisms. However, the soft constraint imposed on determining where to attend to valid features can sometimes be insufficient, resulting in temporal inconsistency. In this paper, we introduce FRESCO, intra-frame correspondence alongside inter-frame correspondence to establish a more robust spatial-temporal constraint. This enhancement ensures a more consistent transformation of semantically similar content across frames. Beyond mere attention guidance, our approach involves an explicit update of features to achieve high spatial-temporal consistency with the input video, significantly improving the visual coherence of the resulting translated videos. Extensive experiments demonstrate the effectiveness of our proposed framework in producing high-quality, coherent videos, marking a notable improvement over existing zero-shot methods.

Summary

AI-Generated Summary

PDF81December 15, 2024