ChatPaper.aiChatPaper

超越正确性:跨文化评估主观写作偏好

Beyond Correctness: Evaluating Subjective Writing Preferences Across Cultures

October 16, 2025
作者: Shuangshuang Ying, Yunwen Li, Xingwei Qu, Xin Li, Sheng Jin, Minghao Liu, Zhoufutu Wen, Xeron Du, Tianyu Zheng, Yichi Zhang, Letian Ni, Yuyang Cheng, Qiguang Chen, Jingzhe Ding, Shengda Long, Wangchunshu Zhou, Jiazhan Feng, Wanjun Zhong, Libo Qin, Ge Zhang, Wenhao Huang, Wanxiang Che, Chenghua Lin
cs.AI

摘要

当前偏好学习方法在标准基准测试中虽能取得高准确率,但在移除客观质量信号后,其性能显著下降。我们引入了WritingPreferenceBench,这是一个包含1,800对人工标注偏好(1,200对英文,600对中文)的数据集,涵盖8种创意写作体裁,其中响应在客观正确性、事实准确性和长度上均相匹配。在此基准上,基于序列的奖励模型——RLHF的标准架构——仅达到52.7%的平均准确率,而零样本语言模型评判者的表现则为53.9%。相比之下,生成式奖励模型通过生成明确的推理链,实现了81.8%的准确率。我们观察到不同体裁间模型内部存在高度差异:单个模型在不同写作类别中的准确率从18.2%到81.8%不等,标准差平均为10.1%。这种差异不因模型规模而改变,27B参数模型相较于8B变体并未展现出持续改进。我们的结果表明,当前RLHF方法主要学习检测客观错误,而非捕捉主观质量偏好(如创意、风格特色和情感共鸣),且成功的偏好建模可能需要中间推理表示而非直接分类。
English
Current preference learning methods achieve high accuracy on standard benchmarks but exhibit significant performance degradation when objective quality signals are removed. We introduce WritingPreferenceBench, a dataset of 1,800 human-annotated preference pairs (1,200 English, 600 Chinese) across 8 creative writing genres, where responses are matched for objective correctness, factual accuracy, and length. On this benchmark, sequence-based reward models--the standard architecture for RLHF--achieve only 52.7% mean accuracy, while zero-shot language model judges perform at 53.9%. In contrast, generative reward models that produce explicit reasoning chains achieve 81.8% accuracy. We observe high within-model variance across genres: individual models range from 18.2% to 81.8% accuracy across different writing categories, with standard deviations averaging 10.1%. This variance persists regardless of model scale, with 27B parameter models showing no consistent improvement over 8B variants. Our results suggest that current RLHF methods primarily learn to detect objective errors rather than capture subjective quality preferences (e.g., creativity, stylistic flair, and emotional resonance), and that successful preference modeling may require intermediate reasoning representations rather than direct classification.
PDF102October 17, 2025