超越正确性:跨文化视角下的主观写作偏好评估
Beyond Correctness: Evaluating Subjective Writing Preferences Across Cultures
October 16, 2025
作者: Shuangshuang Ying, Yunwen Li, Xingwei Qu, Xin Li, Sheng Jin, Minghao Liu, Zhoufutu Wen, Xeron Du, Tianyu Zheng, Yichi Zhang, Letian Ni, Yuyang Cheng, Qiguang Chen, Jingzhe Ding, Shengda Long, Wangchunshu Zhou, Jiazhan Feng, Wanjun Zhong, Libo Qin, Ge Zhang, Wenhao Huang, Wanxiang Che, Chenghua Lin
cs.AI
摘要
当前偏好学习方法在标准基准测试中虽能达到较高准确率,但在移除客观质量信号时,其性能却显著下降。我们推出了WritingPreferenceBench数据集,包含1,800对人工标注的偏好对比(1,200对英文,600对中文),覆盖8种创意写作体裁,确保回答在客观正确性、事实准确性及长度上相匹配。在此基准上,基于序列的奖励模型——RLHF的标准架构——仅取得52.7%的平均准确率,而零样本语言模型评判者则达到53.9%。相比之下,能生成明确推理链的生成式奖励模型准确率高达81.8%。我们观察到,不同体裁间模型内部存在高度差异:单个模型在不同写作类别中的准确率从18.2%到81.8%不等,标准差平均为10.1%。这种差异不随模型规模变化而消失,27B参数模型相较于8B版本并未展现出持续改进。我们的研究结果表明,当前RLHF方法主要学习检测客观错误,而非捕捉主观质量偏好(如创意、风格特色及情感共鸣),且成功的偏好建模可能需要中间推理表示,而非直接分类。
English
Current preference learning methods achieve high accuracy on standard
benchmarks but exhibit significant performance degradation when objective
quality signals are removed. We introduce WritingPreferenceBench, a dataset of
1,800 human-annotated preference pairs (1,200 English, 600 Chinese) across 8
creative writing genres, where responses are matched for objective correctness,
factual accuracy, and length. On this benchmark, sequence-based reward
models--the standard architecture for RLHF--achieve only 52.7% mean accuracy,
while zero-shot language model judges perform at 53.9%. In contrast, generative
reward models that produce explicit reasoning chains achieve 81.8% accuracy. We
observe high within-model variance across genres: individual models range from
18.2% to 81.8% accuracy across different writing categories, with standard
deviations averaging 10.1%. This variance persists regardless of model scale,
with 27B parameter models showing no consistent improvement over 8B variants.
Our results suggest that current RLHF methods primarily learn to detect
objective errors rather than capture subjective quality preferences (e.g.,
creativity, stylistic flair, and emotional resonance), and that successful
preference modeling may require intermediate reasoning representations rather
than direct classification.