ViBiDSampler:利用雙向擴散取樣器增強視頻插值
ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler
October 8, 2024
作者: Serin Yang, Taesung Kwon, Jong Chul Ye
cs.AI
摘要
最近在大規模文本到視頻(T2V)和圖像到視頻(I2V)擴散模型方面取得的進展,極大地增強了視頻生成的能力,特別是在關鍵幀插值方面。然而,目前的圖像到視頻擴散模型,雖然在從單個條件幀生成視頻方面很強大,但需要適應兩幀(起始和結束)條件生成,這對於有效的有界插值至關重要。不幸的是,現有的將時間向前和向後路徑並行融合的方法通常會出現離群問題,導致產生瑕疵或需要多次迭代重新加噪。在這項工作中,我們引入了一種新穎的雙向採樣策略,以解決這些離群問題,而無需進行大量重新加噪或微調。我們的方法沿著向前和向後路徑進行順序採樣,分別以起始幀和結束幀為條件,確保生成中間幀更具連貫性並且在流形上。此外,我們還融入了先進的引導技術,CFG++ 和 DDS,以進一步增強插值過程。通過整合這些技術,我們的方法實現了最先進的性能,高效生成在關鍵幀之間高質量、流暢的視頻。在單個 3090 GPU 上,我們的方法可以在僅 195 秒內以 1024 x 576 的分辨率插補 25 幀,使其成為關鍵幀插值的領先解決方案。
English
Recent progress in large-scale text-to-video (T2V) and image-to-video (I2V)
diffusion models has greatly enhanced video generation, especially in terms of
keyframe interpolation. However, current image-to-video diffusion models, while
powerful in generating videos from a single conditioning frame, need adaptation
for two-frame (start & end) conditioned generation, which is essential for
effective bounded interpolation. Unfortunately, existing approaches that fuse
temporally forward and backward paths in parallel often suffer from
off-manifold issues, leading to artifacts or requiring multiple iterative
re-noising steps. In this work, we introduce a novel, bidirectional sampling
strategy to address these off-manifold issues without requiring extensive
re-noising or fine-tuning. Our method employs sequential sampling along both
forward and backward paths, conditioned on the start and end frames,
respectively, ensuring more coherent and on-manifold generation of intermediate
frames. Additionally, we incorporate advanced guidance techniques, CFG++ and
DDS, to further enhance the interpolation process. By integrating these, our
method achieves state-of-the-art performance, efficiently generating
high-quality, smooth videos between keyframes. On a single 3090 GPU, our method
can interpolate 25 frames at 1024 x 576 resolution in just 195 seconds,
establishing it as a leading solution for keyframe interpolation.Summary
AI-Generated Summary