양자화된 대규모 언어 모델에서 불확실성이 사회적 편향 변화를 주도한다

초록

훈련 후 양자화는 대규모 언어 모델의 계산 비용을 줄이지만, 집계 지표로는 포착할 수 없는 방식으로 사회적 편향을 근본적으로 변화시킵니다. 본 연구는 13개의 폐쇄형 및 개방형 편향 데이터셋으로 구성된 통합 벤치마크인 PostTrainingBiasBench를 통해 평가된 50개의 양자화 모델에 대한 첫 대규모 연구를 제시합니다. 우리는 집계 편향 점수에는 변화가 없음에도 불구하고 양자화 후 응답의 최대 21%가 편향적 상태와 비편향적 상태 사이에서 전환되는 '양자화 유발 마스크 편향 반전' 현상을 확인했습니다. 이러한 반전은 모델 불확실성에 의해 강력하게 주도되며, 높은 불확실성을 보이는 응답은 확신 있는 응답보다 변화할 가능성이 3~11배 더 높았습니다. 양자화 강도는 이 효과를 증폭시켜, 4비트 양자화 모델이 8비트 양자화 모델보다 4~6배 더 많은 행동 변화를 보였습니다. 중요한 것은 이러한 변화가 인구통계학적 그룹 간 비대칭적 영향을 만들어내며, 일부 그룹의 편향은 최대 18.6% 악화되는 동시에 다른 그룹은 14.1% 개선되어 오해의 소지가 있는 중립적인 집계 결과를 초래한다는 점입니다. 더 큰 모델이 일관된 견고성 이점을 보이지 않았으며, 그룹별 변화는 모델 계열에 따라 예측 불가능하게 다양했습니다. 우리의 연구 결과는 압축이 편향 패턴을 근본적으로 변화시켜 실전 신뢰성을 보장하기 위해 중요한 사후 양자화 평가와 개입이 필요함을 입증합니다.

English

Post-training quantization reduces the computational cost of large language models but fundamentally alters their social biases in ways that aggregate metrics fail to capture. We present the first large-scale study of 50 quantized models evaluated on PostTrainingBiasBench, a unified benchmark of 13 closed- and open-ended bias datasets. We identify a phenomenon we term quantization-induced masked bias flipping, in which up to 21% of responses flip between biased and unbiased states after quantization, despite showing no change in aggregate bias scores. These flips are strongly driven by model uncertainty, where the responses with high uncertainty are 3-11x more likely to change than the confident ones. Quantization strength amplifies this effect, with 4-bit quantized models exhibiting 4-6x more behavioral changes than 8-bit quantized models. Critically, these changes create asymmetric impacts across demographic groups, where bias can worsen by up to 18.6% for some groups while improving by 14.1% for others, yielding misleadingly neutral aggregate outcomes. Larger models show no consistent robustness advantage, and group-specific shifts vary unpredictably across model families. Our findings demonstrate that compression fundamentally alters bias patterns, requiring crucial post-quantization evaluation and interventions to ensure reliability in practice.

양자화된 대규모 언어 모델에서 불확실성이 사회적 편향 변화를 주도한다

Uncertainty Drives Social Bias Changes in Quantized Large Language Models

초록

Support