선호가 갈릴 때: 소수자 인식 적응형 DPO를 통한 확산 모델 정렬

초록

최근 몇 년 동안 이미지 생성 분야, 특히 모델을 보편적인 인간 선호도와 맞추는 미세 조정 방법에서 상당한 발전이 이루어졌다. 본 논문은 확산 모델의 학습 과정에서 선호 데이터의 중요한 역할, 특히 Diffusion-DPO와 그 후속 적응 사례를 중심으로 탐구한다. 우리는 이미지 생성에서의 보편적인 인간 선호도를 둘러싼 복잡성을 조사하며, 이러한 선호도의 주관적 특성과 선호 데이터셋 내 소수 샘플이 제기하는 문제점을 강조한다. 파일럿 실험을 통해 소수 샘플의 존재와 이들이 모델 성능에 미치는 부정적인 영향을 입증한다. 우리는 Adaptive-DPO라는 새로운 접근 방식을 제안하는데, 이는 DPO 목적 함수에 소수 인스턴스 인식 메트릭을 통합한다. 이 메트릭은 주석자 내 신뢰도와 주석자 간 안정성을 포함하여 다수와 소수 샘플을 구별한다. 우리는 Adaptive-DPO 손실 함수를 도입하여 DPO 손실을 두 가지 방식으로 개선한다: 모델이 다수 레이블을 더 잘 학습하도록 하면서 동시에 소수 샘플의 부정적 영향을 완화한다. 우리의 실험은 이 방법이 합성 소수 데이터와 실제 선호 데이터 모두를 효과적으로 처리함을 보여주며, 이미지 생성 작업에서 더 효과적인 학습 방법론의 길을 열어준다.

English

In recent years, the field of image generation has witnessed significant advancements, particularly in fine-tuning methods that align models with universal human preferences. This paper explores the critical role of preference data in the training process of diffusion models, particularly in the context of Diffusion-DPO and its subsequent adaptations. We investigate the complexities surrounding universal human preferences in image generation, highlighting the subjective nature of these preferences and the challenges posed by minority samples in preference datasets. Through pilot experiments, we demonstrate the existence of minority samples and their detrimental effects on model performance. We propose Adaptive-DPO -- a novel approach that incorporates a minority-instance-aware metric into the DPO objective. This metric, which includes intra-annotator confidence and inter-annotator stability, distinguishes between majority and minority samples. We introduce an Adaptive-DPO loss function which improves the DPO loss in two ways: enhancing the model's learning of majority labels while mitigating the negative impact of minority samples. Our experiments demonstrate that this method effectively handles both synthetic minority data and real-world preference data, paving the way for more effective training methodologies in image generation tasks.

선호가 갈릴 때: 소수자 인식 적응형 DPO를 통한 확산 모델 정렬

When Preferences Diverge: Aligning Diffusion Models with Minority-Aware Adaptive DPO

초록

Support