量子化大規模言語モデルにおける不確実性が社会的バイアスの変動を促進する

要旨

学習後量子化は大規模言語モデルの計算コストを削減するが、集約指標では捉えられない方法で社会的バイアスを根本的に変化させる。本研究では、13のクローズドエンド型およびオープンエンド型バイアスデータセットから構成される統一ベンチマークPostTrainingBiasBenchを用いて、50の量子化モデルを評価する初の大規模調査を実施する。私たちは「量子化誘発型マスクド・バイアス反転」と呼ぶ現象を特定した。これは、集計バイアススコアに変化がなくとも、量子化後に最大21%の応答がバイアス状態と非バイアス状態の間で反転する現象である。これらの反転はモデルの不確実性に強く影響され、不確実性の高い応答は確信度の高い応答に比べて3～11倍変化しやすい。量子化強度はこの効果を増幅し、4ビット量子化モデルは8ビット量子化モデルより4～6倍多くの動作変化を示す。深刻なことに、これらの変化は人口統計グループ間で非対称的な影響を生み出し、一部のグループではバイアスが最大18.6%悪化する一方、他のグループでは14.1%改善するため、誤解を招く中立的な集計結果が生じる。大規模モデルにも一貫した頑健性の優位性は見られず、グループ特有の変化はモデルファミリー間で予測不可能に変動する。私たちの発見は、圧縮がバイアスパターンを根本的に変化させるため、実用上の信頼性を確保するには量子化後の評価と介入が不可欠であることを示している。

English

Post-training quantization reduces the computational cost of large language models but fundamentally alters their social biases in ways that aggregate metrics fail to capture. We present the first large-scale study of 50 quantized models evaluated on PostTrainingBiasBench, a unified benchmark of 13 closed- and open-ended bias datasets. We identify a phenomenon we term quantization-induced masked bias flipping, in which up to 21% of responses flip between biased and unbiased states after quantization, despite showing no change in aggregate bias scores. These flips are strongly driven by model uncertainty, where the responses with high uncertainty are 3-11x more likely to change than the confident ones. Quantization strength amplifies this effect, with 4-bit quantized models exhibiting 4-6x more behavioral changes than 8-bit quantized models. Critically, these changes create asymmetric impacts across demographic groups, where bias can worsen by up to 18.6% for some groups while improving by 14.1% for others, yielding misleadingly neutral aggregate outcomes. Larger models show no consistent robustness advantage, and group-specific shifts vary unpredictably across model families. Our findings demonstrate that compression fundamentally alters bias patterns, requiring crucial post-quantization evaluation and interventions to ensure reliability in practice.

量子化大規模言語モデルにおける不確実性が社会的バイアスの変動を促進する

Uncertainty Drives Social Bias Changes in Quantized Large Language Models

要旨

Support