超越词汇统计:从语义与统计视角量化大语言模型的公平性
Quantifying Fairness in LLMs Beyond Tokens: A Semantic and Statistical Perspective
June 23, 2025
作者: Weijie Xu, Yiwen Wang, Chi Xue, Xiangkun Hu, Xi Fang, Guimin Dong, Chandan K. Reddy
cs.AI
摘要
大型语言模型(LLMs)生成的回答常带有固有偏见,这削弱了其在实际应用中的可靠性。现有评估方法往往忽略长文本回复中的偏见及LLM输出的内在变异性。为解决这些问题,我们提出FiSCo(细粒度语义计算框架)——一种通过检测不同人口统计群体长文本回复中细微语义差异来评估LLMs群体层面公平性的新型统计框架。与先前聚焦情感分析或词汇级对比的研究不同,FiSCo突破表层分析,在主张层面运作,利用蕴涵校验来评估跨回复的语义一致性。我们将模型输出解构为语义独立的主张,并应用统计假设检验比较群体间与群体内相似度,从而实现对隐性偏见的稳健检测。本文形式化定义了新型群体反事实公平准则,并在涵盖性别、种族和年龄的合成数据与人标注数据上验证了FiSCo。实验表明,FiSCo在降低随机输出变异影响的同时,能更可靠地识别微妙偏见,其表现优于多种现有评估指标。
English
Large Language Models (LLMs) often generate responses with inherent biases,
undermining their reliability in real-world applications. Existing evaluation
methods often overlook biases in long-form responses and the intrinsic
variability of LLM outputs. To address these challenges, we propose
FiSCo(Fine-grained Semantic Computation), a novel statistical framework to
evaluate group-level fairness in LLMs by detecting subtle semantic differences
in long-form responses across demographic groups. Unlike prior work focusing on
sentiment or token-level comparisons, FiSCo goes beyond surface-level analysis
by operating at the claim level, leveraging entailment checks to assess the
consistency of meaning across responses. We decompose model outputs into
semantically distinct claims and apply statistical hypothesis testing to
compare inter- and intra-group similarities, enabling robust detection of
subtle biases. We formalize a new group counterfactual fairness definition and
validate FiSCo on both synthetic and human-annotated datasets spanning gender,
race, and age. Experiments show that FiSco more reliably identifies nuanced
biases while reducing the impact of stochastic LLM variability, outperforming
various evaluation metrics.