DyPE:面向超高分辨率扩散模型的动态位置外推
DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion
October 23, 2025
作者: Noam Issachar, Guy Yariv, Sagie Benaim, Yossi Adi, Dani Lischinski, Raanan Fattal
cs.AI
摘要
扩散Transformer模型能够生成具有卓越保真度和细节的图像,但由于自注意力机制随图像标记数量呈二次方缩放,在超高分辨率下训练这些模型仍然成本极高。本文提出动态位置外推法(DyPE),这是一种无需重新训练的新方法,可使预训练的扩散Transformer以远超训练数据的分辨率合成图像,且无需额外采样成本。DyPE利用扩散过程固有的频谱递进特性——低频结构早期收敛,而高频细节需更多步骤才能解析。具体而言,DyPE在扩散过程的每一步动态调整模型的位置编码,使其频谱特性与当前生成阶段相匹配。该方法支持生成远超训练分辨率的图像(例如使用FLUX模型生成1600万像素图像)。在多个基准测试中,DyPE持续提升性能,在超高分辨率图像生成中实现业界领先的保真度,且分辨率越高优势越显著。项目页面详见https://noamissachar.github.io/DyPE/。
English
Diffusion Transformer models can generate images with remarkable fidelity and
detail, yet training them at ultra-high resolutions remains extremely costly
due to the self-attention mechanism's quadratic scaling with the number of
image tokens. In this paper, we introduce Dynamic Position Extrapolation
(DyPE), a novel, training-free method that enables pre-trained diffusion
transformers to synthesize images at resolutions far beyond their training
data, with no additional sampling cost. DyPE takes advantage of the spectral
progression inherent to the diffusion process, where low-frequency structures
converge early, while high-frequencies take more steps to resolve.
Specifically, DyPE dynamically adjusts the model's positional encoding at each
diffusion step, matching their frequency spectrum with the current stage of the
generative process. This approach allows us to generate images at resolutions
that exceed the training resolution dramatically, e.g., 16 million pixels using
FLUX. On multiple benchmarks, DyPE consistently improves performance and
achieves state-of-the-art fidelity in ultra-high-resolution image generation,
with gains becoming even more pronounced at higher resolutions. Project page is
available at https://noamissachar.github.io/DyPE/.