ChatPaper.aiChatPaper

歷史引導的影片擴散

History-Guided Video Diffusion

February 10, 2025
作者: Kiwhan Song, Boyuan Chen, Max Simchowitz, Yilun Du, Russ Tedrake, Vincent Sitzmann
cs.AI

摘要

無分類器引導(CFG)是改善擴散模型中條件生成的關鍵技術,可在提高樣本質量的同時實現更準確的控制。將此技術擴展到視頻擴散是合乎自然的,該方法生成視頻時會根據不同數量的上下文幀(統稱為歷史)。然而,我們發現在具有可變長度歷史的引導中存在兩個關鍵挑戰:僅支持固定大小條件的架構,以及CFG風格歷史丟棄的實證觀察表現不佳。為了解決這個問題,我們提出了擴散強制變換器(DFoT),這是一種視頻擴散架構和理論基礎訓練目標,共同實現對可變數量歷史幀的條件生成。然後,我們介紹了歷史引導,這是一系列由DFoT獨特啟用的引導方法。我們展示了它最簡單的形式,即普通歷史引導,已經顯著提高了視頻生成質量和時間一致性。更先進的方法,跨時間和頻率的歷史引導進一步增強了運動動態,實現了對分布外歷史的組成泛化,並能夠穩定地展開極長的視頻。網站:https://boyuan.space/history-guidance
English
Classifier-free guidance (CFG) is a key technique for improving conditional generation in diffusion models, enabling more accurate control while enhancing sample quality. It is natural to extend this technique to video diffusion, which generates video conditioned on a variable number of context frames, collectively referred to as history. However, we find two key challenges to guiding with variable-length history: architectures that only support fixed-size conditioning, and the empirical observation that CFG-style history dropout performs poorly. To address this, we propose the Diffusion Forcing Transformer (DFoT), a video diffusion architecture and theoretically grounded training objective that jointly enable conditioning on a flexible number of history frames. We then introduce History Guidance, a family of guidance methods uniquely enabled by DFoT. We show that its simplest form, vanilla history guidance, already significantly improves video generation quality and temporal consistency. A more advanced method, history guidance across time and frequency further enhances motion dynamics, enables compositional generalization to out-of-distribution history, and can stably roll out extremely long videos. Website: https://boyuan.space/history-guidance

Summary

AI-Generated Summary

PDF122February 11, 2025