無調整噪音矯正技術,用於高保真度影像轉視訊生成
Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation
March 5, 2024
作者: Weijie Li, Litong Gong, Yiran Zhu, Fanda Fan, Biao Wang, Tiezheng Ge, Bo Zheng
cs.AI
摘要
圖像到視頻(I2V)生成任務總是在開放領域中保持高保真度方面遇到困難。傳統的圖像動畫技術主要集中在特定領域,如臉部或人體姿勢,這使它們難以推廣到開放領域。基於擴散模型的一些最近的I2V框架可以為開放領域的圖像生成動態內容,但無法保持保真度。我們發現低保真度的兩個主要因素是在去噪過程中丟失圖像細節和噪聲預測偏差。為此,我們提出了一種有效的方法,可應用於主流視頻擴散模型。該方法通過補充更精確的圖像信息和噪聲糾正來實現高保真度。具體而言,對於指定的圖像,我們的方法首先向輸入圖像潛在添加噪聲以保留更多細節,然後對帶有適當糾正的噪聲潛在進行去噪以減輕噪聲預測偏差。我們的方法無需調整即可即插即用。實驗結果證明了我們方法在提高生成視頻保真度方面的有效性。有關更多圖像到視頻生成結果,請參閱項目網站:https://noise-rectification.github.io。
English
Image-to-video (I2V) generation tasks always suffer from keeping high
fidelity in the open domains. Traditional image animation techniques primarily
focus on specific domains such as faces or human poses, making them difficult
to generalize to open domains. Several recent I2V frameworks based on diffusion
models can generate dynamic content for open domain images but fail to maintain
fidelity. We found that two main factors of low fidelity are the loss of image
details and the noise prediction biases during the denoising process. To this
end, we propose an effective method that can be applied to mainstream video
diffusion models. This method achieves high fidelity based on supplementing
more precise image information and noise rectification. Specifically, given a
specified image, our method first adds noise to the input image latent to keep
more details, then denoises the noisy latent with proper rectification to
alleviate the noise prediction biases. Our method is tuning-free and
plug-and-play. The experimental results demonstrate the effectiveness of our
approach in improving the fidelity of generated videos. For more image-to-video
generated results, please refer to the project website:
https://noise-rectification.github.io.