人間の判断を超えて：LLMの道徳的価値観に対するベイズ的評価理解

要旨

大規模言語モデルは、人間と比較してどのように道徳的次元を理解しているのか？この市場をリードする言語モデルに対する初の大規模ベイジアン評価がその答えを提供する。従来の決定論的なグラウンドトゥルース（多数決または包含ルール）を使用した研究とは対照的に、我々はアノテーター間の不一致をモデル化し、アレータリック不確実性（人間の本質的な意見の相違）とエピステミック不確実性（モデルのドメイン感度）の両方を捉える。我々は、ソーシャルメディア、ニュース、フォーラムにまたがる10万以上のテキストに対して、約700人のアノテーターから得られた25万以上のアノテーションを用いて、主要な言語モデル（Claude Sonnet 4、DeepSeek-V3、Llama 4 Maverick）を評価した。 GPU最適化されたベイジアンフレームワークは100万以上のモデルクエリを処理し、AIモデルが通常、人間のアノテーターの上位25％にランクされ、平均を大幅に上回るバランス精度を達成していることを明らかにした。重要なことに、AIは人間よりもはるかに少ない偽陰性を生成し、より敏感な道徳検出能力を強調している。

English

How do large language models understand moral dimensions compared to humans? This first large-scale Bayesian evaluation of market-leading language models provides the answer. In contrast to prior work using deterministic ground truth (majority or inclusion rules), we model annotator disagreements to capture both aleatoric uncertainty (inherent human disagreement) and epistemic uncertainty (model domain sensitivity). We evaluate top language models (Claude Sonnet 4, DeepSeek-V3, Llama 4 Maverick) across 250K+ annotations from ~700 annotators on 100K+ texts spanning social media, news, and forums. Our GPU-optimized Bayesian framework processed 1M+ model queries, revealing that AI models typically rank among the top 25\% of human annotators, achieving much better-than-average balanced accuracy. Importantly, we find that AI produces far fewer false negatives than humans, highlighting their more sensitive moral detection capabilities.

人間の判断を超えて：LLMの道徳的価値観に対するベイズ的評価理解

Beyond Human Judgment: A Bayesian Evaluation of LLMs' Moral Values Understanding

要旨

Support