潛在空間超解析度：基於擴散模型的高解析度影像生成

摘要

在本篇論文中，我們提出了LSRNA，這是一種新穎的框架，旨在利用擴散模型實現更高解析度（超過1K）的圖像生成，其核心在於直接在潛在空間中進行超解析度處理。現有的擴散模型在超越其訓練解析度時往往會遇到困難，導致結構扭曲或內容重複。基於參考的方法通過上採樣低解析度參考圖像來引導更高解析度的生成，從而解決這些問題。然而，這些方法面臨著重大挑戰：在潛在空間中進行上採樣常常會導致流形偏差，從而降低輸出質量。另一方面，在RGB空間中進行上採樣則容易產生過於平滑的輸出。為了克服這些限制，LSRNA結合了潛在空間超解析度（LSR）以實現流形對齊，以及區域性噪聲添加（RNA）以增強高頻細節。我們的大量實驗表明，整合LSRNA在多種解析度和指標上均優於最先進的基於參考的方法，同時展示了潛在空間上採樣在保持細節和銳度方面的關鍵作用。相關代碼已公開於https://github.com/3587jjh/LSRNA。

English

In this paper, we propose LSRNA, a novel framework for higher-resolution (exceeding 1K) image generation using diffusion models by leveraging super-resolution directly in the latent space. Existing diffusion models struggle with scaling beyond their training resolutions, often leading to structural distortions or content repetition. Reference-based methods address the issues by upsampling a low-resolution reference to guide higher-resolution generation. However, they face significant challenges: upsampling in latent space often causes manifold deviation, which degrades output quality. On the other hand, upsampling in RGB space tends to produce overly smoothed outputs. To overcome these limitations, LSRNA combines Latent space Super-Resolution (LSR) for manifold alignment and Region-wise Noise Addition (RNA) to enhance high-frequency details. Our extensive experiments demonstrate that integrating LSRNA outperforms state-of-the-art reference-based methods across various resolutions and metrics, while showing the critical role of latent space upsampling in preserving detail and sharpness. The code is available at https://github.com/3587jjh/LSRNA.

潛在空間超解析度：基於擴散模型的高解析度影像生成

Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models

摘要

Support