潜在空间一小步,像素世界大飞跃:为扩散模型打造的快速潜在上采样适配器
One Small Step in Latent, One Giant Leap for Pixels: Fast Latent Upscale Adapter for Your Diffusion Models
November 13, 2025
作者: Aleksandr Razin, Danil Kazantsev, Ilya Makarov
cs.AI
摘要
扩散模型在超越其训练分辨率时面临挑战,因为直接进行高分辨率采样既缓慢又成本高昂,而事后图像超分辨率(ISR)技术则在解码后操作,不仅引入了伪影,还增加了额外的延迟。我们提出了潜在上采样适配器(LUA),这是一种轻量级模块,能够在最终VAE解码步骤之前直接在生成器的潜在代码上执行超分辨率。LUA作为一个即插即用组件集成,无需对基础模型进行修改或增加额外的扩散阶段,并通过潜在空间中的单次前向传递实现高分辨率合成。一个共享的Swin风格骨干网络,配合特定尺度的像素重组头,支持2倍和4倍的上采样因子,并与图像空间超分辨率基线保持兼容,在解码和上采样时间上实现了近3倍的降低(从512像素生成1024像素仅增加+0.42秒,而使用相同SwinIR架构的像素空间超分辨率则需要1.87秒)。此外,LUA在不同VAE的潜在空间中展现出强大的泛化能力,使其无需为每个新解码器从头训练即可轻松部署。大量实验证明,LUA在保持与原生高分辨率生成相当保真度的同时,为现代扩散管道中的可扩展、高保真图像合成提供了一条实用且高效的路径。
English
Diffusion models struggle to scale beyond their training resolutions, as direct high-resolution sampling is slow and costly, while post-hoc image super-resolution (ISR) introduces artifacts and additional latency by operating after decoding. We present the Latent Upscaler Adapter (LUA), a lightweight module that performs super-resolution directly on the generator's latent code before the final VAE decoding step. LUA integrates as a drop-in component, requiring no modifications to the base model or additional diffusion stages, and enables high-resolution synthesis through a single feed-forward pass in latent space. A shared Swin-style backbone with scale-specific pixel-shuffle heads supports 2x and 4x factors and remains compatible with image-space SR baselines, achieving comparable perceptual quality with nearly 3x lower decoding and upscaling time (adding only +0.42 s for 1024 px generation from 512 px, compared to 1.87 s for pixel-space SR using the same SwinIR architecture). Furthermore, LUA shows strong generalization across the latent spaces of different VAEs, making it easy to deploy without retraining from scratch for each new decoder. Extensive experiments demonstrate that LUA closely matches the fidelity of native high-resolution generation while offering a practical and efficient path to scalable, high-fidelity image synthesis in modern diffusion pipelines.