ChatPaper.aiChatPaper

美学对齐风险同化:图像生成与奖励模型如何强化审美偏见与意识形态"审查"

Aesthetic Alignment Risks Assimilation: How Image Generation and Reward Models Reinforce Beauty Bias and Ideological "Censorship"

December 9, 2025
作者: Wenqi Marshall Guo, Qingyun Qian, Khalad Hasan, Shan Du
cs.AI

摘要

过度将图像生成模型与广义审美偏好对齐会与用户意图产生冲突,特别是在需要"反审美"输出用于艺术或批判目的时。这种对齐机制优先考虑以开发者为中心的价值观,损害了用户自主权和审美多元性。我们通过构建广谱美学数据集并评估前沿生成模型与奖励模型,验证了这种偏差。研究发现:审美对齐的生成模型往往默认输出传统意义上的美观图像,无法响应低画质或负面意象的生成指令;更关键的是,奖励模型会对反审美图像实施惩罚,即便这些图像完全符合用户的显式指令。通过图像编辑实验和真实抽象艺术品的对比评估,我们确认了这种系统性偏差的存在。
English
Over-aligning image generation models to a generalized aesthetic preference conflicts with user intent, particularly when ``anti-aesthetic" outputs are requested for artistic or critical purposes. This adherence prioritizes developer-centered values, compromising user autonomy and aesthetic pluralism. We test this bias by constructing a wide-spectrum aesthetics dataset and evaluating state-of-the-art generation and reward models. We find that aesthetic-aligned generation models frequently default to conventionally beautiful outputs, failing to respect instructions for low-quality or negative imagery. Crucially, reward models penalize anti-aesthetic images even when they perfectly match the explicit user prompt. We confirm this systemic bias through image-to-image editing and evaluation against real abstract artworks.
PDF62December 17, 2025