ChatPaper.aiChatPaper

BroadWay:以無需訓練的方式增強您的文本轉視頻生成模型

BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way

October 8, 2024
作者: Jiazi Bu, Pengyang Ling, Pan Zhang, Tong Wu, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Dahua Lin, Jiaqi Wang
cs.AI

摘要

最近,文字轉視頻(T2V)生成模型因提供方便的視覺創作而受到越來越多的關注。儘管具有巨大潛力,生成的視頻可能存在諸如結構不合理、時間不一致和缺乏動態等瑕疵,通常導致接近靜態的視頻。在這項工作中,我們已經確定了不同區塊之間時間注意力地圖差異與時間不一致性發生之間的相關性。此外,我們觀察到時間注意力地圖中包含的能量與生成的視頻中運動振幅的大小直接相關。基於這些觀察,我們提出了BroadWay,一種無需額外引入參數、擴充記憶體或採樣時間的訓練免費方法,用於改善文字轉視頻生成的質量。具體而言,BroadWay 由兩個主要組件組成:1)時間自我引導通過減少不同解碼器區塊之間時間注意力地圖的差異,改善生成視頻的結構合理性和時間一致性。2)基於傅立葉的運動增強通過放大地圖的能量來增強運動的大小和豐富度。大量實驗表明,BroadWay 顯著提高了文字轉視頻生成的質量,並且幾乎沒有額外成本。
English
The text-to-video (T2V) generation models, offering convenient visual creation, have recently garnered increasing attention. Despite their substantial potential, the generated videos may present artifacts, including structural implausibility, temporal inconsistency, and a lack of motion, often resulting in near-static video. In this work, we have identified a correlation between the disparity of temporal attention maps across different blocks and the occurrence of temporal inconsistencies. Additionally, we have observed that the energy contained within the temporal attention maps is directly related to the magnitude of motion amplitude in the generated videos. Based on these observations, we present BroadWay, a training-free method to improve the quality of text-to-video generation without introducing additional parameters, augmenting memory or sampling time. Specifically, BroadWay is composed of two principal components: 1) Temporal Self-Guidance improves the structural plausibility and temporal consistency of generated videos by reducing the disparity between the temporal attention maps across various decoder blocks. 2) Fourier-based Motion Enhancement enhances the magnitude and richness of motion by amplifying the energy of the map. Extensive experiments demonstrate that BroadWay significantly improves the quality of text-to-video generation with negligible additional cost.

Summary

AI-Generated Summary

PDF102November 16, 2024