SHARP:基于风险画像的社会危害分析——衡量大语言模型中的不公平性
SHARP: Social Harm Analysis via Risk Profiles for Measuring Inequities in Large Language Models
January 29, 2026
作者: Alok Abhishek, Tushar Bandopadhyay, Lisa Erickson
cs.AI
摘要
大型语言模型正日益应用于高风险领域,其中罕见但严重的故障可能导致不可逆的损害。然而主流评估基准常将复杂的社会风险简化为以均值为中心的标量分数,从而模糊了分布结构、跨维度交互作用及最坏情况行为。本文提出基于风险画像的社会危害分析框架SHARP,该框架通过多维度、分布感知的方法评估社会危害。SHARP将危害建模为多元随机变量,将显式分解为偏见、公平性、伦理和认知可靠性的分析,与重构为加性累积对数风险的联合失效聚合方法相结合。该框架进一步采用风险敏感型分布统计量,以条件风险价值作为核心指标,以刻画最坏情况下的模型行为。将SHARP应用于11个前沿LLM的评估(基于固定包含901个社会敏感性提示语的语料库),发现具有相似平均风险的模型在尾部暴露和波动性上可能呈现两倍以上差异。跨模型分析显示,各维度边际尾部行为呈现系统性差异:偏见维度表现出最强的尾部严重性,认知与公平风险处于中等区间,而伦理失准风险持续较低;这些模式共同揭示了标量基准所掩盖的异构化、模型依赖的失效结构。研究表明,对LLM的负责任评估与治理需要超越标量均值,转向多维度、尾部敏感的风险画像分析。
English
Large language models (LLMs) are increasingly deployed in high-stakes domains, where rare but severe failures can result in irreversible harm. However, prevailing evaluation benchmarks often reduce complex social risk to mean-centered scalar scores, thereby obscuring distributional structure, cross-dimensional interactions, and worst-case behavior. This paper introduces Social Harm Analysis via Risk Profiles (SHARP), a framework for multidimensional, distribution-aware evaluation of social harm. SHARP models harm as a multivariate random variable and integrates explicit decomposition into bias, fairness, ethics, and epistemic reliability with a union-of-failures aggregation reparameterized as additive cumulative log-risk. The framework further employs risk-sensitive distributional statistics, with Conditional Value at Risk (CVaR95) as a primary metric, to characterize worst-case model behavior. Application of SHARP to eleven frontier LLMs, evaluated on a fixed corpus of n=901 socially sensitive prompts, reveals that models with similar average risk can exhibit more than twofold differences in tail exposure and volatility. Across models, dimension-wise marginal tail behavior varies systematically across harm dimensions, with bias exhibiting the strongest tail severities, epistemic and fairness risks occupying intermediate regimes, and ethical misalignment consistently lower; together, these patterns reveal heterogeneous, model-dependent failure structures that scalar benchmarks conflate. These findings indicate that responsible evaluation and governance of LLMs require moving beyond scalar averages toward multidimensional, tail-sensitive risk profiling.