ChatPaper.aiChatPaper

DyPE:適用於超高解析度擴散的動態位置外推法

DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion

October 23, 2025
作者: Noam Issachar, Guy Yariv, Sagie Benaim, Yossi Adi, Dani Lischinski, Raanan Fattal
cs.AI

摘要

擴散轉換器模型能生成具有卓越保真度與細節的影像,但由於自注意力機制會隨影像標記數量呈二次方擴增,在超高解析度下訓練這類模型仍成本高昂。本文提出動態位置外推法(DyPE),這是一種無需重新訓練的新穎方法,能使預訓練的擴散轉換器以遠超訓練資料的解析度合成影像,且無需額外採樣成本。DyPE 利用擴散過程固有的頻譜漸進特性——低頻結構會早期收斂,而高頻細節需更多步驟才能解析。具體而言,DyPE 在每個擴散步驟動態調整模型的位置編碼,使其頻譜與當前生成階段相匹配。此方法可實現大幅超越訓練解析度的影像生成(例如使用 FLUX 生成 1600 萬像素影像)。在多項基準測試中,DyPE 不僅持續提升效能,更在超高解析度影像生成中達到最先進的保真度,且解析度越高效果越顯著。專案頁面請見:https://noamissachar.github.io/DyPE/。
English
Diffusion Transformer models can generate images with remarkable fidelity and detail, yet training them at ultra-high resolutions remains extremely costly due to the self-attention mechanism's quadratic scaling with the number of image tokens. In this paper, we introduce Dynamic Position Extrapolation (DyPE), a novel, training-free method that enables pre-trained diffusion transformers to synthesize images at resolutions far beyond their training data, with no additional sampling cost. DyPE takes advantage of the spectral progression inherent to the diffusion process, where low-frequency structures converge early, while high-frequencies take more steps to resolve. Specifically, DyPE dynamically adjusts the model's positional encoding at each diffusion step, matching their frequency spectrum with the current stage of the generative process. This approach allows us to generate images at resolutions that exceed the training resolution dramatically, e.g., 16 million pixels using FLUX. On multiple benchmarks, DyPE consistently improves performance and achieves state-of-the-art fidelity in ultra-high-resolution image generation, with gains becoming even more pronounced at higher resolutions. Project page is available at https://noamissachar.github.io/DyPE/.
PDF343December 2, 2025