ChatPaper.aiChatPaper

超越二元偏好:通过属性解耦实现扩散模型与细粒度标准的对齐

Beyond Binary Preference: Aligning Diffusion Models to Fine-grained Criteria by Decoupling Attributes

January 7, 2026
作者: Chenye Meng, Zejian Li, Zhongni Liu, Yize Li, Changle Xie, Kaixin Jia, Ling Yang, Huanghuang Deng, Shiying Ding, Shengyuan Zhang, Jiayi Li, Lingyun Sun
cs.AI

摘要

扩散模型的后训练对齐通常依赖简化信号,如标量奖励或二元偏好,这限制了与层次化、细粒度的人类专业知识的契合。为解决此问题,我们首先联合领域专家构建了层次化细粒度评估标准,将图像质量解构为以树状结构组织的多维度正负属性。基于此,我们提出两阶段对齐框架:首先通过监督微调将领域知识注入辅助扩散模型;随后提出复合偏好优化(CPO),将DPO扩展至非二元层次化标准对齐。具体而言,我们重新形式化对齐问题,使其在辅助扩散模型引导下同步最大化正属性概率并最小化负属性概率。我们在绘画生成领域实例化了该方法,基于标注的细粒度属性画作数据集进行CPO训练。大量实验表明,CPO显著提升了生成质量与专业契合度,为细粒度标准对齐开辟了新路径。
English
Post-training alignment of diffusion models relies on simplified signals, such as scalar rewards or binary preferences. This limits alignment with complex human expertise, which is hierarchical and fine-grained. To address this, we first construct a hierarchical, fine-grained evaluation criteria with domain experts, which decomposes image quality into multiple positive and negative attributes organized in a tree structure. Building on this, we propose a two-stage alignment framework. First, we inject domain knowledge to an auxiliary diffusion model via Supervised Fine-Tuning. Second, we introduce Complex Preference Optimization (CPO) that extends DPO to align the target diffusion to our non-binary, hierarchical criteria. Specifically, we reformulate the alignment problem to simultaneously maximize the probability of positive attributes while minimizing the probability of negative attributes with the auxiliary diffusion. We instantiate our approach in the domain of painting generation and conduct CPO training with an annotated dataset of painting with fine-grained attributes based on our criteria. Extensive experiments demonstrate that CPO significantly enhances generation quality and alignment with expertise, opening new avenues for fine-grained criteria alignment.
PDF10January 10, 2026