ChatPaper.aiChatPaper

利用扩散变换器内部动力学引导其自身生成过程

Guiding a Diffusion Transformer with the Internal Dynamics of Itself

December 30, 2025
作者: Xingyu Zhou, Qifan Li, Xiaobin Hu, Hai Chen, Shuhang Gu
cs.AI

摘要

扩散模型展现出捕捉完整(条件)数据分布的强大能力。然而由于缺乏足够训练数据来学习覆盖低概率区域,模型会因未能生成对应这些区域的高质量图像而受惩罚。为提升生成质量,可采用无分类器引导(CFG)等指导策略,在采样阶段将样本导向高概率区域。但标准CFG常导致样本过度简化或失真。另一方面,采用劣化版本引导扩散模型的替代方案受限于精心设计的退化策略、额外训练及附加采样步骤。本文提出一种简单有效的内部引导(IG)策略:在训练阶段对中间层引入辅助监督,在采样阶段通过外推中间层与深层输出来获得生成结果。该策略在多种基线模型上显著提升了训练效率和生成质量。在ImageNet 256×256数据集上,SiT-XL/2+IG在80和800轮训练时分别达到FID=5.31和FID=1.75。更令人瞩目的是,LightningDiT-XL/1+IG实现了FID=1.34,显著优于所有对比方法。结合CFG后,LightningDiT-XL/1+IG更以1.19的FID刷新当前最优纪录。
English
The diffusion model presents a powerful ability to capture the entire (conditional) data distribution. However, due to the lack of sufficient training and data to learn to cover low-probability areas, the model will be penalized for failing to generate high-quality images corresponding to these areas. To achieve better generation quality, guidance strategies such as classifier free guidance (CFG) can guide the samples to the high-probability areas during the sampling stage. However, the standard CFG often leads to over-simplified or distorted samples. On the other hand, the alternative line of guiding diffusion model with its bad version is limited by carefully designed degradation strategies, extra training and additional sampling steps. In this paper, we proposed a simple yet effective strategy Internal Guidance (IG), which introduces an auxiliary supervision on the intermediate layer during training process and extrapolates the intermediate and deep layer's outputs to obtain generative results during sampling process. This simple strategy yields significant improvements in both training efficiency and generation quality on various baselines. On ImageNet 256x256, SiT-XL/2+IG achieves FID=5.31 and FID=1.75 at 80 and 800 epochs. More impressively, LightningDiT-XL/1+IG achieves FID=1.34 which achieves a large margin between all of these methods. Combined with CFG, LightningDiT-XL/1+IG achieves the current state-of-the-art FID of 1.19.
PDF41January 2, 2026