ChatPaper.aiChatPaper

隨波逐流:使用即時變形噪音的運動可控視頻擴散模型

Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise

January 14, 2025
作者: Ryan Burgert, Yuancheng Xu, Wenqi Xian, Oliver Pilarski, Pascal Clausen, Mingming He, Li Ma, Yitong Deng, Lingxiao Li, Mohsen Mousavi, Michael Ryoo, Paul Debevec, Ning Yu
cs.AI

摘要

生成建模旨在將隨機噪音轉換為結構化輸出。在這項工作中,我們通過允許運動控制通過結構化潛在噪音抽樣來增強視頻擴散模型。這是通過僅改變數據來實現的:我們對訓練視頻進行預處理以產生結構化噪音。因此,我們的方法對擴散模型設計是不可知的,無需更改模型架構或訓練流程。具體來說,我們提出了一種新穎的噪音扭曲算法,足夠快速運行以實時替換隨機時間高斯性,該高斯性來自光流場中導出的相關扭曲噪音,同時保留空間高斯性。我們算法的效率使我們能夠使用扭曲噪音微調現代視頻擴散基模型,並提供廣泛的用戶友好運動控制一站式解決方案:局部對象運動控制、全局攝像機運動控制和運動轉移。在我們的扭曲噪音中時間一致性和空間高斯性之間的和諧性導致有效的運動控制,同時保持每幀像素質量。廣泛的實驗和用戶研究證明了我們方法的優勢,使其成為控制視頻擴散模型中運動的堅固且可擴展的方法。視頻結果可在我們的網頁上找到:https://vgenai-netflix-eyeline-research.github.io/Go-with-the-Flow。源代碼和模型檢查點可在GitHub上找到:https://github.com/VGenAI-Netflix-Eyeline-Research/Go-with-the-Flow。
English
Generative modeling aims to transform random noise into structured outputs. In this work, we enhance video diffusion models by allowing motion control via structured latent noise sampling. This is achieved by just a change in data: we pre-process training videos to yield structured noise. Consequently, our method is agnostic to diffusion model design, requiring no changes to model architectures or training pipelines. Specifically, we propose a novel noise warping algorithm, fast enough to run in real time, that replaces random temporal Gaussianity with correlated warped noise derived from optical flow fields, while preserving the spatial Gaussianity. The efficiency of our algorithm enables us to fine-tune modern video diffusion base models using warped noise with minimal overhead, and provide a one-stop solution for a wide range of user-friendly motion control: local object motion control, global camera movement control, and motion transfer. The harmonization between temporal coherence and spatial Gaussianity in our warped noise leads to effective motion control while maintaining per-frame pixel quality. Extensive experiments and user studies demonstrate the advantages of our method, making it a robust and scalable approach for controlling motion in video diffusion models. Video results are available on our webpage: https://vgenai-netflix-eyeline-research.github.io/Go-with-the-Flow. Source code and model checkpoints are available on GitHub: https://github.com/VGenAI-Netflix-Eyeline-Research/Go-with-the-Flow.

Summary

AI-Generated Summary

PDF203January 22, 2025