ChatPaper.aiChatPaper

扩散模型安全直接偏好优化(Diffusion-SDPO)

Diffusion-SDPO: Safeguarded Direct Preference Optimization for Diffusion Models

November 5, 2025
作者: Minghao Fu, Guo-Hua Wang, Tianyu Cui, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang
cs.AI

摘要

基于扩散模型的文生图技术虽能生成高质量图像,但其输出与人类偏好的对齐仍具挑战。我们重新审视了基于扩散模型的直接偏好优化(DPO)方法,发现一个关键缺陷:扩大偏好间隔未必能提升生成质量。具体而言,标准Diffusion-DPO目标函数可能同时增加优胜分支和劣汰分支的重建误差。当劣汰分支的退化程度加剧时,即使偏好间隔扩大,优胜分支也会受到负面影响。为此,我们提出Diffusion-SDPO——一种通过自适应缩放劣汰分支梯度来保护优胜分支的安全更新规则。一阶分析推导出的闭式缩放系数可确保在每一步优化中,优选输出的误差保持非递增。该方法结构简洁、模型无关,能广泛兼容现有DPO式对齐框架,且仅增加边际计算开销。在标准文生图基准测试中,Diffusion-SDPO在自动化偏好度、审美评价及提示词对齐指标上均持续优于现有偏好学习基线。代码已开源:https://github.com/AIDC-AI/Diffusion-SDPO。
English
Text-to-image diffusion models deliver high-quality images, yet aligning them with human preferences remains challenging. We revisit diffusion-based Direct Preference Optimization (DPO) for these models and identify a critical pathology: enlarging the preference margin does not necessarily improve generation quality. In particular, the standard Diffusion-DPO objective can increase the reconstruction error of both winner and loser branches. Consequently, degradation of the less-preferred outputs can become sufficiently severe that the preferred branch is also adversely affected even as the margin grows. To address this, we introduce Diffusion-SDPO, a safeguarded update rule that preserves the winner by adaptively scaling the loser gradient according to its alignment with the winner gradient. A first-order analysis yields a closed-form scaling coefficient that guarantees the error of the preferred output is non-increasing at each optimization step. Our method is simple, model-agnostic, broadly compatible with existing DPO-style alignment frameworks and adds only marginal computational overhead. Across standard text-to-image benchmarks, Diffusion-SDPO delivers consistent gains over preference-learning baselines on automated preference, aesthetic, and prompt alignment metrics. Code is publicly available at https://github.com/AIDC-AI/Diffusion-SDPO.
PDF52December 2, 2025