ChatPaper.aiChatPaper

ZipIR:用於高解析度影像修復的潛在金字塔擴散變換器

ZipIR: Latent Pyramid Diffusion Transformer for High-Resolution Image Restoration

April 11, 2025
作者: Yongsheng Yu, Haitian Zheng, Zhifei Zhang, Jianming Zhang, Yuqian Zhou, Connelly Barnes, Yuchen Liu, Wei Xiong, Zhe Lin, Jiebo Luo
cs.AI

摘要

近期生成模型的進展顯著提升了圖像修復能力,尤其是通過強大的擴散模型,這些模型在語意細節和局部保真度的恢復上表現出色。然而,在超高分辨率下部署這些模型面臨著質量與效率之間的關鍵權衡,這是由於長程注意力機制的計算需求所致。為解決這一問題,我們引入了ZipIR,這是一個新穎的框架,旨在提升高分辨率圖像修復的效率、可擴展性及長程建模能力。ZipIR採用了一種高度壓縮的潛在表示,將圖像壓縮32倍,有效減少了空間標記的數量,並使得如擴散變壓器(DiT)等高容量模型的使用成為可能。為實現這一目標,我們提出了一種潛在金字塔變分自編碼器(LP-VAE)設計,該設計將潛在空間結構化為子帶,以簡化擴散訓練。ZipIR在最高2K分辨率的完整圖像上進行訓練,超越了現有的基於擴散的方法,在從嚴重退化的輸入中恢復高分辨率圖像時,提供了無與倫比的速度和質量。
English
Recent progress in generative models has significantly improved image restoration capabilities, particularly through powerful diffusion models that offer remarkable recovery of semantic details and local fidelity. However, deploying these models at ultra-high resolutions faces a critical trade-off between quality and efficiency due to the computational demands of long-range attention mechanisms. To address this, we introduce ZipIR, a novel framework that enhances efficiency, scalability, and long-range modeling for high-res image restoration. ZipIR employs a highly compressed latent representation that compresses image 32x, effectively reducing the number of spatial tokens, and enabling the use of high-capacity models like the Diffusion Transformer (DiT). Toward this goal, we propose a Latent Pyramid VAE (LP-VAE) design that structures the latent space into sub-bands to ease diffusion training. Trained on full images up to 2K resolution, ZipIR surpasses existing diffusion-based methods, offering unmatched speed and quality in restoring high-resolution images from severely degraded inputs.

Summary

AI-Generated Summary

PDF182April 14, 2025