ChatPaper.aiChatPaper

超越人類判斷:大型語言模型道德價值的貝葉斯評估 理解

Beyond Human Judgment: A Bayesian Evaluation of LLMs' Moral Values Understanding

August 19, 2025
作者: Maciej Skorski, Alina Landowska
cs.AI

摘要

大型語言模型如何理解道德維度,與人類相比有何異同?這項首次大規模貝葉斯評估市場領先語言模型的研究給出了答案。與先前使用確定性基準(多數或包含規則)的研究不同,我們通過建模註釋者分歧來捕捉偶然不確定性(人類固有的分歧)和認知不確定性(模型領域敏感性)。我們評估了頂尖語言模型(Claude Sonnet 4、DeepSeek-V3、Llama 4 Maverick)在來自約700名註釋者的25萬+條註釋上的表現,這些註釋涵蓋了社交媒體、新聞和論壇的10萬+條文本。 我們的GPU優化貝葉斯框架處理了100萬+次模型查詢,結果顯示AI模型通常排名在人類註釋者的前25%,達到了遠高於平均水平的平衡準確率。重要的是,我們發現AI產生的假陰性遠少於人類,這凸顯了其更為敏銳的道德檢測能力。
English
How do large language models understand moral dimensions compared to humans? This first large-scale Bayesian evaluation of market-leading language models provides the answer. In contrast to prior work using deterministic ground truth (majority or inclusion rules), we model annotator disagreements to capture both aleatoric uncertainty (inherent human disagreement) and epistemic uncertainty (model domain sensitivity). We evaluate top language models (Claude Sonnet 4, DeepSeek-V3, Llama 4 Maverick) across 250K+ annotations from ~700 annotators on 100K+ texts spanning social media, news, and forums. Our GPU-optimized Bayesian framework processed 1M+ model queries, revealing that AI models typically rank among the top 25\% of human annotators, achieving much better-than-average balanced accuracy. Importantly, we find that AI produces far fewer false negatives than humans, highlighting their more sensitive moral detection capabilities.
PDF01August 20, 2025