ChatPaper.aiChatPaper

超越人类判断:基于贝叶斯方法评估大语言模型的道德价值观 理解

Beyond Human Judgment: A Bayesian Evaluation of LLMs' Moral Values Understanding

August 19, 2025
作者: Maciej Skorski, Alina Landowska
cs.AI

摘要

大型语言模型如何理解道德维度,与人类相比有何异同?这项首次大规模贝叶斯评估针对市场领先的语言模型给出了答案。与以往使用确定性基准(多数或包含规则)的研究不同,我们通过建模标注者间的分歧,同时捕捉了偶然不确定性(人类固有的分歧)和认知不确定性(模型对领域的敏感性)。我们评估了顶尖语言模型(Claude Sonnet 4、DeepSeek-V3、Llama 4 Maverick),基于约700名标注者对超过10万条来自社交媒体、新闻和论坛的文本做出的25万+条注释。 我们的GPU优化贝叶斯框架处理了超过100万次模型查询,结果显示AI模型通常位列人类标注者前25%,达到了远高于平均水平的平衡准确率。尤为重要的是,我们发现AI产生的假阴性远少于人类,凸显了其更为敏锐的道德检测能力。
English
How do large language models understand moral dimensions compared to humans? This first large-scale Bayesian evaluation of market-leading language models provides the answer. In contrast to prior work using deterministic ground truth (majority or inclusion rules), we model annotator disagreements to capture both aleatoric uncertainty (inherent human disagreement) and epistemic uncertainty (model domain sensitivity). We evaluate top language models (Claude Sonnet 4, DeepSeek-V3, Llama 4 Maverick) across 250K+ annotations from ~700 annotators on 100K+ texts spanning social media, news, and forums. Our GPU-optimized Bayesian framework processed 1M+ model queries, revealing that AI models typically rank among the top 25\% of human annotators, achieving much better-than-average balanced accuracy. Importantly, we find that AI produces far fewer false negatives than humans, highlighting their more sensitive moral detection capabilities.
PDF11August 20, 2025