潛在空間超解析度:基於擴散模型的高解析度影像生成
Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models
March 24, 2025
作者: Jinho Jeong, Sangmin Han, Jinwoo Kim, Seon Joo Kim
cs.AI
摘要
在本篇論文中,我們提出了LSRNA,這是一種新穎的框架,旨在利用擴散模型實現更高解析度(超過1K)的圖像生成,其核心在於直接在潛在空間中進行超解析度處理。現有的擴散模型在超越其訓練解析度時往往會遇到困難,導致結構扭曲或內容重複。基於參考的方法通過上採樣低解析度參考圖像來引導更高解析度的生成,從而解決這些問題。然而,這些方法面臨著重大挑戰:在潛在空間中進行上採樣常常會導致流形偏差,從而降低輸出質量。另一方面,在RGB空間中進行上採樣則容易產生過於平滑的輸出。為了克服這些限制,LSRNA結合了潛在空間超解析度(LSR)以實現流形對齊,以及區域性噪聲添加(RNA)以增強高頻細節。我們的大量實驗表明,整合LSRNA在多種解析度和指標上均優於最先進的基於參考的方法,同時展示了潛在空間上採樣在保持細節和銳度方面的關鍵作用。相關代碼已公開於https://github.com/3587jjh/LSRNA。
English
In this paper, we propose LSRNA, a novel framework for higher-resolution
(exceeding 1K) image generation using diffusion models by leveraging
super-resolution directly in the latent space. Existing diffusion models
struggle with scaling beyond their training resolutions, often leading to
structural distortions or content repetition. Reference-based methods address
the issues by upsampling a low-resolution reference to guide higher-resolution
generation. However, they face significant challenges: upsampling in latent
space often causes manifold deviation, which degrades output quality. On the
other hand, upsampling in RGB space tends to produce overly smoothed outputs.
To overcome these limitations, LSRNA combines Latent space Super-Resolution
(LSR) for manifold alignment and Region-wise Noise Addition (RNA) to enhance
high-frequency details. Our extensive experiments demonstrate that integrating
LSRNA outperforms state-of-the-art reference-based methods across various
resolutions and metrics, while showing the critical role of latent space
upsampling in preserving detail and sharpness. The code is available at
https://github.com/3587jjh/LSRNA.Summary
AI-Generated Summary