ChatPaper.aiChatPaper

美学对齐风险同化:图像生成与奖励模型如何强化审美偏见与意识形态"规训"

Aesthetic Alignment Risks Assimilation: How Image Generation and Reward Models Reinforce Beauty Bias and Ideological "Censorship"

December 9, 2025
作者: Wenqi Marshall Guo, Qingyun Qian, Khalad Hasan, Shan Du
cs.AI

摘要

对图像生成模型进行过度对齐以迎合广义审美偏好,会与用户意图产生冲突,尤其当用户出于艺术或批判目的需要"反审美"输出时。这种对齐机制将开发者中心的价值观置于首位,损害了用户自主权与审美多元性。我们通过构建广谱美学数据集并评估前沿生成模型与奖励模型,验证了这种偏差。研究发现:审美对齐的生成模型常默认输出符合传统美学的图像,无法响应低画质或负面意象的生成指令;更关键的是,奖励模型会对反审美图像施加惩罚,即便其完全符合用户明确提示。通过图像编辑实验和真实抽象艺术作品评估,我们证实了这种系统性偏差的存在。
English
Over-aligning image generation models to a generalized aesthetic preference conflicts with user intent, particularly when ``anti-aesthetic" outputs are requested for artistic or critical purposes. This adherence prioritizes developer-centered values, compromising user autonomy and aesthetic pluralism. We test this bias by constructing a wide-spectrum aesthetics dataset and evaluating state-of-the-art generation and reward models. We find that aesthetic-aligned generation models frequently default to conventionally beautiful outputs, failing to respect instructions for low-quality or negative imagery. Crucially, reward models penalize anti-aesthetic images even when they perfectly match the explicit user prompt. We confirm this systemic bias through image-to-image editing and evaluation against real abstract artworks.
PDF62December 17, 2025