美学对齐风险同化：图像生成与奖励模型如何强化审美偏见与意识形态"审查"

摘要

过度将图像生成模型与广义审美偏好对齐会与用户意图产生冲突，特别是在需要"反审美"输出用于艺术或批判目的时。这种对齐机制优先考虑以开发者为中心的价值观，损害了用户自主权和审美多元性。我们通过构建广谱美学数据集并评估前沿生成模型与奖励模型，验证了这种偏差。研究发现：审美对齐的生成模型往往默认输出传统意义上的美观图像，无法响应低画质或负面意象的生成指令；更关键的是，奖励模型会对反审美图像实施惩罚，即便这些图像完全符合用户的显式指令。通过图像编辑实验和真实抽象艺术品的对比评估，我们确认了这种系统性偏差的存在。

English

Over-aligning image generation models to a generalized aesthetic preference conflicts with user intent, particularly when ``anti-aesthetic" outputs are requested for artistic or critical purposes. This adherence prioritizes developer-centered values, compromising user autonomy and aesthetic pluralism. We test this bias by constructing a wide-spectrum aesthetics dataset and evaluating state-of-the-art generation and reward models. We find that aesthetic-aligned generation models frequently default to conventionally beautiful outputs, failing to respect instructions for low-quality or negative imagery. Crucially, reward models penalize anti-aesthetic images even when they perfectly match the explicit user prompt. We confirm this systemic bias through image-to-image editing and evaluation against real abstract artworks.

美学对齐风险同化：图像生成与奖励模型如何强化审美偏见与意识形态"审查"

Aesthetic Alignment Risks Assimilation: How Image Generation and Reward Models Reinforce Beauty Bias and Ideological "Censorship"

摘要

Support