ChatPaper.aiChatPaper

FlowVid:馴服不完美的光流以實現一致的視訊合成

FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis

December 29, 2023
作者: Feng Liang, Bichen Wu, Jialiang Wang, Licheng Yu, Kunpeng Li, Yinan Zhao, Ishan Misra, Jia-Bin Huang, Peizhao Zhang, Peter Vajda, Diana Marculescu
cs.AI

摘要

擴散模型已經改變了影像合成,並且現在正逐漸應用於影片。然而,影片合成的進展受到了在影片幀之間保持時間一致性的挑戰所阻礙。本文提出了一個一致的影片合成框架,通過共同利用源影片中的空間條件和時間光流線索。與先前嚴格遵循光流的方法相反,我們的方法利用了光流的優勢,同時處理了光流估計中的不完美之處。我們通過從第一幀進行變形編碼光流,並將其作為擴散模型中的補充參考。這使得我們的模型能夠通過使用任何主流的影像合成模型編輯第一幀,然後將編輯擴散到後續幀。我們的影片合成模型 FlowVid 具有卓越的特性:(1) 彈性:FlowVid 與現有的影像合成模型無縫配合,實現各種修改,包括風格化、對象交換和局部編輯。 (2) 效率:生成一個30 FPS、512x512 分辨率的4秒影片僅需1.5 分鐘,比 CoDeF、Rerender 和 TokenFlow 分別快 3.1 倍、7.2 倍和 10.5 倍。 (3) 高質量:在用戶研究中,我們的 FlowVid 在45.7% 的時間被首選,優於 CoDeF (3.5%)、Rerender (10.2%) 和 TokenFlow (40.4%)。
English
Diffusion models have transformed the image-to-image (I2I) synthesis and are now permeating into videos. However, the advancement of video-to-video (V2V) synthesis has been hampered by the challenge of maintaining temporal consistency across video frames. This paper proposes a consistent V2V synthesis framework by jointly leveraging spatial conditions and temporal optical flow clues within the source video. Contrary to prior methods that strictly adhere to optical flow, our approach harnesses its benefits while handling the imperfection in flow estimation. We encode the optical flow via warping from the first frame and serve it as a supplementary reference in the diffusion model. This enables our model for video synthesis by editing the first frame with any prevalent I2I models and then propagating edits to successive frames. Our V2V model, FlowVid, demonstrates remarkable properties: (1) Flexibility: FlowVid works seamlessly with existing I2I models, facilitating various modifications, including stylization, object swaps, and local edits. (2) Efficiency: Generation of a 4-second video with 30 FPS and 512x512 resolution takes only 1.5 minutes, which is 3.1x, 7.2x, and 10.5x faster than CoDeF, Rerender, and TokenFlow, respectively. (3) High-quality: In user studies, our FlowVid is preferred 45.7% of the time, outperforming CoDeF (3.5%), Rerender (10.2%), and TokenFlow (40.4%).
PDF201December 15, 2024