BEATS: 대규모 언어 모델을 위한 편향 평가 및 분석 테스트 스위트

초록

본 연구에서는 대규모 언어 모델(LLMs)의 편향성, 윤리성, 공정성 및 사실성을 평가하기 위한 새로운 프레임워크인 BEATS를 소개합니다. BEATS 프레임워크를 기반으로, 우리는 29개의 독립적인 지표를 통해 LLM의 성능을 측정하는 편향성 벤치마크를 제시합니다. 이러한 지표는 인구통계학적, 인지적, 사회적 편향성뿐만 아니라 윤리적 추론, 집단 공정성, 그리고 사실성과 관련된 오정보 위험 측정 등 다양한 특성을 포괄합니다. 이러한 지표들은 LLM이 생성한 응답이 체계적 불평등을 강화하거나 확장할 수 있는 사회적 편견을 어느 정도 반영하는지에 대한 정량적 평가를 가능하게 합니다. 이 벤치마크에서 높은 점수를 얻기 위해서는 LLM이 응답에서 매우 공정한 행동을 보여야 하며, 이는 책임 있는 AI 평가를 위한 엄격한 기준이 됩니다. 우리의 실험 데이터를 기반으로 한 경험적 결과에 따르면, 업계를 선도하는 모델들이 생성한 출력의 37.65%가 어떤 형태의 편향성을 포함하고 있어, 이러한 모델들을 중요한 의사결정 시스템에 사용할 때 상당한 위험이 있음을 보여줍니다. BEATS 프레임워크와 벤치마크는 LLM을 벤치마킹하고, 편향성을 유발하는 요인을 진단하며, 완화 전략을 개발하기 위한 확장 가능하고 통계적으로 엄격한 방법론을 제공합니다. BEATS 프레임워크를 통해, 우리는 더 사회적으로 책임감 있고 윤리적으로 정렬된 AI 모델 개발을 돕는 것을 목표로 합니다.

English

In this research, we introduce BEATS, a novel framework for evaluating Bias, Ethics, Fairness, and Factuality in Large Language Models (LLMs). Building upon the BEATS framework, we present a bias benchmark for LLMs that measure performance across 29 distinct metrics. These metrics span a broad range of characteristics, including demographic, cognitive, and social biases, as well as measures of ethical reasoning, group fairness, and factuality related misinformation risk. These metrics enable a quantitative assessment of the extent to which LLM generated responses may perpetuate societal prejudices that reinforce or expand systemic inequities. To achieve a high score on this benchmark a LLM must show very equitable behavior in their responses, making it a rigorous standard for responsible AI evaluation. Empirical results based on data from our experiment show that, 37.65\% of outputs generated by industry leading models contained some form of bias, highlighting a substantial risk of using these models in critical decision making systems. BEATS framework and benchmark offer a scalable and statistically rigorous methodology to benchmark LLMs, diagnose factors driving biases, and develop mitigation strategies. With the BEATS framework, our goal is to help the development of more socially responsible and ethically aligned AI models.

BEATS: 대규모 언어 모델을 위한 편향 평가 및 분석 테스트 스위트

BEATS: Bias Evaluation and Assessment Test Suite for Large Language Models

초록

Support