SHARP:基於風險剖面的社會危害分析——衡量大型語言模型中的不公平性
SHARP: Social Harm Analysis via Risk Profiles for Measuring Inequities in Large Language Models
January 29, 2026
作者: Alok Abhishek, Tushar Bandopadhyay, Lisa Erickson
cs.AI
摘要
大型语言模型正日益部署于高风险领域,其罕见但严重的失效可能导致不可逆的损害。然而主流评估基准常将复杂的社会风险简化为以均值为核心的标量分数,从而模糊了分布结构、跨维度交互作用及最差情形行为。本文提出基于风险画像的社会危害分析框架,通过多维度、分布感知的评估方法对社会危害进行建模。该框架将危害构建为多元随机变量,整合了针对偏见、公平性、伦理和认知可靠性的显式分解,并通过加性累积对数风险重构的"失效并集"聚合方法进行综合评估。该框架进一步采用风险敏感型分布统计量,以条件风险价值作为核心指标,以刻画模型的最差情形行为。应用该框架对11个前沿大模型开展评估(基于固定语料库中901个社会敏感性提示词),发现具有相似平均风险的模型在尾部风险暴露和波动性上可能呈现两倍以上差异。跨模型分析显示,各危害维度的边际尾部行为呈现系统性差异:偏见维度表现出最强的尾部严重性,认知与公平风险处于中等区间,而伦理失准风险持续较低;这些模式共同揭示了标量基准所掩盖的异质性、模型依赖型失效结构。研究结果表明,对大模型的责任评估与治理需超越标量均值,转向多维度、尾部敏感的风险画像分析。
English
Large language models (LLMs) are increasingly deployed in high-stakes domains, where rare but severe failures can result in irreversible harm. However, prevailing evaluation benchmarks often reduce complex social risk to mean-centered scalar scores, thereby obscuring distributional structure, cross-dimensional interactions, and worst-case behavior. This paper introduces Social Harm Analysis via Risk Profiles (SHARP), a framework for multidimensional, distribution-aware evaluation of social harm. SHARP models harm as a multivariate random variable and integrates explicit decomposition into bias, fairness, ethics, and epistemic reliability with a union-of-failures aggregation reparameterized as additive cumulative log-risk. The framework further employs risk-sensitive distributional statistics, with Conditional Value at Risk (CVaR95) as a primary metric, to characterize worst-case model behavior. Application of SHARP to eleven frontier LLMs, evaluated on a fixed corpus of n=901 socially sensitive prompts, reveals that models with similar average risk can exhibit more than twofold differences in tail exposure and volatility. Across models, dimension-wise marginal tail behavior varies systematically across harm dimensions, with bias exhibiting the strongest tail severities, epistemic and fairness risks occupying intermediate regimes, and ethical misalignment consistently lower; together, these patterns reveal heterogeneous, model-dependent failure structures that scalar benchmarks conflate. These findings indicate that responsible evaluation and governance of LLMs require moving beyond scalar averages toward multidimensional, tail-sensitive risk profiling.