BEATS: 大規模言語モデルのバイアス評価・診断テストスイート

要旨

本研究では、大規模言語モデル（LLM）のバイアス、倫理、公平性、および事実性を評価するための新しいフレームワークであるBEATSを紹介します。BEATSフレームワークを基盤として、29の異なる指標にわたるLLMのパフォーマンスを測定するバイアスベンチマークを提示します。これらの指標は、人口統計学的、認知的、社会的バイアスから、倫理的推論、グループ公平性、および事実性に関連する誤情報リスクに至るまで、幅広い特性をカバーしています。これらの指標により、LLMが生成する応答が、システム的不平等を強化または拡大する社会的偏見を永続させる程度を定量的に評価することが可能です。このベンチマークで高得点を達成するためには、LLMがその応答において非常に公平な振る舞いを示す必要があり、責任あるAI評価のための厳格な基準となっています。実験データに基づく実証結果によると、業界をリードするモデルが生成する出力の37.65％に何らかのバイアスが含まれており、これらのモデルを重要な意思決定システムで使用することの重大なリスクが浮き彫りになりました。BEATSフレームワークとベンチマークは、LLMをベンチマークし、バイアスを駆動する要因を診断し、緩和策を開発するためのスケーラブルで統計的に厳密な方法論を提供します。BEATSフレームワークを通じて、私たちの目標は、より社会的に責任があり、倫理的に整合したAIモデルの開発を支援することです。

English

In this research, we introduce BEATS, a novel framework for evaluating Bias, Ethics, Fairness, and Factuality in Large Language Models (LLMs). Building upon the BEATS framework, we present a bias benchmark for LLMs that measure performance across 29 distinct metrics. These metrics span a broad range of characteristics, including demographic, cognitive, and social biases, as well as measures of ethical reasoning, group fairness, and factuality related misinformation risk. These metrics enable a quantitative assessment of the extent to which LLM generated responses may perpetuate societal prejudices that reinforce or expand systemic inequities. To achieve a high score on this benchmark a LLM must show very equitable behavior in their responses, making it a rigorous standard for responsible AI evaluation. Empirical results based on data from our experiment show that, 37.65\% of outputs generated by industry leading models contained some form of bias, highlighting a substantial risk of using these models in critical decision making systems. BEATS framework and benchmark offer a scalable and statistically rigorous methodology to benchmark LLMs, diagnose factors driving biases, and develop mitigation strategies. With the BEATS framework, our goal is to help the development of more socially responsible and ethically aligned AI models.

BEATS: 大規模言語モデルのバイアス評価・診断テストスイート

BEATS: Bias Evaluation and Assessment Test Suite for Large Language Models

要旨

Support