토큰을 넘어선 LLM 공정성 측정: 의미론적 및 통계적 관점

초록

대규모 언어 모델(LLMs)은 종종 내재된 편향성을 가진 응답을 생성하여 실제 응용에서의 신뢰성을 저해합니다. 기존 평가 방법들은 장문 응답에서의 편향성과 LLM 출력의 본질적 변동성을 간과하는 경우가 많습니다. 이러한 문제를 해결하기 위해, 우리는 FiSCo(Fine-grained Semantic Computation)라는 새로운 통계적 프레임워크를 제안합니다. FiSCo는 인구통계학적 그룹 간 장문 응답에서 미묘한 의미적 차이를 탐지함으로써 LLM의 그룹 수준 공정성을 평가합니다. 감정이나 토큰 수준 비교에 초점을 맞춘 기존 연구와 달리, FiSCo는 표면적 분석을 넘어 주장 수준에서 작동하며, 함의 검사를 활용하여 응답 간 의미의 일관성을 평가합니다. 모델 출력을 의미적으로 구분되는 주장으로 분해하고 통계적 가설 검정을 적용하여 그룹 간 및 그룹 내 유사성을 비교함으로써 미묘한 편향성을 강력하게 탐지할 수 있습니다. 우리는 새로운 그룹 반사실적 공정성 정의를 공식화하고, 성별, 인종, 연령에 걸친 합성 및 인간 주석 데이터셋에서 FiSCo를 검증합니다. 실험 결과, FiSCo는 다양한 평가 지표를 능가하며, LLM의 확률적 변동성의 영향을 줄이면서도 미묘한 편향성을 더욱 신뢰성 있게 식별하는 것으로 나타났습니다.

English

Large Language Models (LLMs) often generate responses with inherent biases, undermining their reliability in real-world applications. Existing evaluation methods often overlook biases in long-form responses and the intrinsic variability of LLM outputs. To address these challenges, we propose FiSCo(Fine-grained Semantic Computation), a novel statistical framework to evaluate group-level fairness in LLMs by detecting subtle semantic differences in long-form responses across demographic groups. Unlike prior work focusing on sentiment or token-level comparisons, FiSCo goes beyond surface-level analysis by operating at the claim level, leveraging entailment checks to assess the consistency of meaning across responses. We decompose model outputs into semantically distinct claims and apply statistical hypothesis testing to compare inter- and intra-group similarities, enabling robust detection of subtle biases. We formalize a new group counterfactual fairness definition and validate FiSCo on both synthetic and human-annotated datasets spanning gender, race, and age. Experiments show that FiSco more reliably identifies nuanced biases while reducing the impact of stochastic LLM variability, outperforming various evaluation metrics.

토큰을 넘어선 LLM 공정성 측정: 의미론적 및 통계적 관점

Quantifying Fairness in LLMs Beyond Tokens: A Semantic and Statistical Perspective

초록

Support