当偏好分歧时：通过少数群体感知的自适应DPO对齐扩散模型

摘要

近年来，图像生成领域取得了显著进展，特别是在使模型与人类普遍偏好对齐的微调方法方面。本文探讨了偏好数据在扩散模型训练过程中的关键作用，尤其是在Diffusion-DPO及其后续改进中的应用。我们研究了图像生成中人类普遍偏好的复杂性，强调了这些偏好的主观性以及偏好数据集中少数样本带来的挑战。通过初步实验，我们证实了少数样本的存在及其对模型性能的负面影响。我们提出了Adaptive-DPO——一种将少数样本感知指标融入DPO目标的新方法。该指标包括标注者内部置信度和标注者间稳定性，能够区分多数样本与少数样本。我们引入了一种Adaptive-DPO损失函数，从两个方面改进了DPO损失：增强模型对多数标签的学习，同时减轻少数样本的负面影响。实验表明，该方法在处理合成少数数据及真实世界偏好数据方面均表现出色，为图像生成任务中更有效的训练方法铺平了道路。

English

In recent years, the field of image generation has witnessed significant advancements, particularly in fine-tuning methods that align models with universal human preferences. This paper explores the critical role of preference data in the training process of diffusion models, particularly in the context of Diffusion-DPO and its subsequent adaptations. We investigate the complexities surrounding universal human preferences in image generation, highlighting the subjective nature of these preferences and the challenges posed by minority samples in preference datasets. Through pilot experiments, we demonstrate the existence of minority samples and their detrimental effects on model performance. We propose Adaptive-DPO -- a novel approach that incorporates a minority-instance-aware metric into the DPO objective. This metric, which includes intra-annotator confidence and inter-annotator stability, distinguishes between majority and minority samples. We introduce an Adaptive-DPO loss function which improves the DPO loss in two ways: enhancing the model's learning of majority labels while mitigating the negative impact of minority samples. Our experiments demonstrate that this method effectively handles both synthetic minority data and real-world preference data, paving the way for more effective training methodologies in image generation tasks.

当偏好分歧时：通过少数群体感知的自适应DPO对齐扩散模型

When Preferences Diverge: Aligning Diffusion Models with Minority-Aware Adaptive DPO

摘要

Support