当偏好分歧时:通过少数群体感知的自适应DPO对齐扩散模型
When Preferences Diverge: Aligning Diffusion Models with Minority-Aware Adaptive DPO
March 21, 2025
作者: Lingfan Zhang, Chen Liu, Chengming Xu, Kai Hu, Donghao Luo, Chengjie Wang, Yanwei Fu, Yuan Yao
cs.AI
摘要
近年来,图像生成领域取得了显著进展,特别是在使模型与人类普遍偏好对齐的微调方法方面。本文探讨了偏好数据在扩散模型训练过程中的关键作用,尤其是在Diffusion-DPO及其后续改进中的应用。我们研究了图像生成中人类普遍偏好的复杂性,强调了这些偏好的主观性以及偏好数据集中少数样本带来的挑战。通过初步实验,我们证实了少数样本的存在及其对模型性能的负面影响。我们提出了Adaptive-DPO——一种将少数样本感知指标融入DPO目标的新方法。该指标包括标注者内部置信度和标注者间稳定性,能够区分多数样本与少数样本。我们引入了一种Adaptive-DPO损失函数,从两个方面改进了DPO损失:增强模型对多数标签的学习,同时减轻少数样本的负面影响。实验表明,该方法在处理合成少数数据及真实世界偏好数据方面均表现出色,为图像生成任务中更有效的训练方法铺平了道路。
English
In recent years, the field of image generation has witnessed significant
advancements, particularly in fine-tuning methods that align models with
universal human preferences. This paper explores the critical role of
preference data in the training process of diffusion models, particularly in
the context of Diffusion-DPO and its subsequent adaptations. We investigate the
complexities surrounding universal human preferences in image generation,
highlighting the subjective nature of these preferences and the challenges
posed by minority samples in preference datasets. Through pilot experiments, we
demonstrate the existence of minority samples and their detrimental effects on
model performance. We propose Adaptive-DPO -- a novel approach that
incorporates a minority-instance-aware metric into the DPO objective. This
metric, which includes intra-annotator confidence and inter-annotator
stability, distinguishes between majority and minority samples. We introduce an
Adaptive-DPO loss function which improves the DPO loss in two ways: enhancing
the model's learning of majority labels while mitigating the negative impact of
minority samples. Our experiments demonstrate that this method effectively
handles both synthetic minority data and real-world preference data, paving the
way for more effective training methodologies in image generation tasks.Summary
AI-Generated Summary