用於高解析度視頻生成的階層式補丁擴散模型
Hierarchical Patch Diffusion Models for High-Resolution Video Generation
June 12, 2024
作者: Ivan Skorokhodov, Willi Menapace, Aliaksandr Siarohin, Sergey Tulyakov
cs.AI
摘要
擴散模型在圖像和視頻合成方面展現出卓越的性能。然而,將其擴展至高分辨率輸入具有挑戰性,需要將擴散管道重組為多個獨立組件,從而限制了可擴展性並使下游應用變得複雜。這在訓練過程中非常高效,並實現了對高分辨率視頻的端到端優化。我們以兩種原則方式改進了PDMs。首先,為了強化各個區塊之間的一致性,我們開發了深度上下文融合——一種從低尺度到高尺度區塊以階層方式傳播上下文信息的結構技術。其次,為了加速訓練和推斷,我們提出了自適應計算,該方法將更多的網絡容量和計算資源分配給粗略的圖像細節。最終模型在UCF-101 256^2的類條件視頻生成中取得了新的最先進FVD得分為66.32和Inception Score為87.68,超過了最近方法超過100%。然後,我們展示它可以從基礎36x64低分辨率生成器快速微調,用於高分辨率64x288x512文本到視頻合成。據我們所知,我們的模型是第一個完全端到端訓練的基於擴散的架構,可以在如此高的分辨率上進行訓練。項目網頁:https://snap-research.github.io/hpdm。
English
Diffusion models have demonstrated remarkable performance in image and video
synthesis. However, scaling them to high-resolution inputs is challenging and
requires restructuring the diffusion pipeline into multiple independent
components, limiting scalability and complicating downstream applications. This
makes it very efficient during training and unlocks end-to-end optimization on
high-resolution videos. We improve PDMs in two principled ways. First, to
enforce consistency between patches, we develop deep context fusion -- an
architectural technique that propagates the context information from low-scale
to high-scale patches in a hierarchical manner. Second, to accelerate training
and inference, we propose adaptive computation, which allocates more network
capacity and computation towards coarse image details. The resulting model sets
a new state-of-the-art FVD score of 66.32 and Inception Score of 87.68 in
class-conditional video generation on UCF-101 256^2, surpassing recent
methods by more than 100%. Then, we show that it can be rapidly fine-tuned from
a base 36times 64 low-resolution generator for high-resolution 64 times
288 times 512 text-to-video synthesis. To the best of our knowledge, our
model is the first diffusion-based architecture which is trained on such high
resolutions entirely end-to-end. Project webpage:
https://snap-research.github.io/hpdm.Summary
AI-Generated Summary