擴散-銳化:利用去噪軌跡銳化微調擴散模型
Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening
February 17, 2025
作者: Ye Tian, Ling Yang, Xinchen Zhang, Yunhai Tong, Mengdi Wang, Bin Cui
cs.AI
摘要
我們提出了「擴散-銳化」(Diffusion-Sharpening)的微調方法,通過優化採樣軌跡來增強下游對齊。現有基於強化學習的微調方法專注於單個訓練時間步,忽略了軌跡級別的對齊,而最近的採樣軌跡優化方法則帶來了顯著的推論 NFE 成本。擴散-銳化通過使用路徑積分框架在訓練期間選擇最佳軌跡,利用獎勵反饋並攤銷推論成本來克服這一問題。我們的方法展示了卓越的訓練效率,收斂速度更快,並且在不需要額外 NFE 的情況下實現了最佳推論效率。大量實驗表明,擴散-銳化在文本對齊、組合能力和人類偏好等多個指標上優於基於強化學習的微調方法(例如 Diffusion-DPO)和採樣軌跡優化方法(例如 Inference Scaling),為未來擴散模型微調提供了一個可擴展且高效的解決方案。代碼:https://github.com/Gen-Verse/Diffusion-Sharpening
English
We propose Diffusion-Sharpening, a fine-tuning approach that enhances
downstream alignment by optimizing sampling trajectories. Existing RL-based
fine-tuning methods focus on single training timesteps and neglect
trajectory-level alignment, while recent sampling trajectory optimization
methods incur significant inference NFE costs. Diffusion-Sharpening overcomes
this by using a path integral framework to select optimal trajectories during
training, leveraging reward feedback, and amortizing inference costs. Our
method demonstrates superior training efficiency with faster convergence, and
best inference efficiency without requiring additional NFEs. Extensive
experiments show that Diffusion-Sharpening outperforms RL-based fine-tuning
methods (e.g., Diffusion-DPO) and sampling trajectory optimization methods
(e.g., Inference Scaling) across diverse metrics including text alignment,
compositional capabilities, and human preferences, offering a scalable and
efficient solution for future diffusion model fine-tuning. Code:
https://github.com/Gen-Verse/Diffusion-Sharpening