압축된 신뢰 해독: 압축된 효율적 LLM의 신뢰성 검토

초록

고성능 대규모 언어 모델(LLMs)을 압축하는 것은 자원 효율적인 추론을 위한 선호 전략으로 부상했습니다. 최첨단(SoTA) 압축 방법들은 일반 작업 성능을 유지하는 데 있어 인상적인 발전을 이루었지만, 압축이 안전성과 신뢰성 측면에서 초래할 수 있는 잠재적 위험은 크게 간과되어 왔습니다. 본 연구는 세 가지(3) 주요 LLM을 대상으로 다섯 가지(5) SoTA 압축 기술을 적용하여 여덟 가지(8) 신뢰성 차원에 걸쳐 처음으로 철저한 평가를 수행했습니다. 우리의 실험은 압축과 신뢰성 간의 복잡한 상호작용을 부각시키며 몇 가지 흥미로운 패턴을 밝혀냈습니다. 양자화(quantization)가 현재로서는 효율성과 신뢰성을 동시에 달성하는 데 있어 가지치기(pruning)보다 더 효과적인 접근법임을 발견했습니다. 예를 들어, 4비트 양자화 모델은 원본 모델의 신뢰성을 유지하지만, 모델 가지치기는 50% 희소성에서도 신뢰성을 크게 저하시킵니다. 또한, 적당한 비트 범위 내에서 양자화를 적용하면 윤리성과 공정성과 같은 특정 신뢰성 차원이 예상치 못하게 개선될 수 있습니다. 반면, 매우 낮은 비트 수준(3비트)으로의 극단적인 양자화는 신뢰성을 크게 감소시키는 경향이 있습니다. 이러한 증가된 위험은 단순히 일반 성능만으로는 파악할 수 없으며, 이는 실질적으로 포괄적인 신뢰성 평가의 필요성을 강조합니다. 이러한 연구 결과는 LLM에서 높은 유용성, 효율성, 신뢰성을 동시에 달성하기 위한 실질적인 권장 사항으로 귀결됩니다. 모델과 코드는 https://decoding-comp-trust.github.io/에서 확인할 수 있습니다.

English

Compressing high-capability Large Language Models (LLMs) has emerged as a favored strategy for resource-efficient inferences. While state-of-the-art (SoTA) compression methods boast impressive advancements in preserving benign task performance, the potential risks of compression in terms of safety and trustworthiness have been largely neglected. This study conducts the first, thorough evaluation of three (3) leading LLMs using five (5) SoTA compression techniques across eight (8) trustworthiness dimensions. Our experiments highlight the intricate interplay between compression and trustworthiness, revealing some interesting patterns. We find that quantization is currently a more effective approach than pruning in achieving efficiency and trustworthiness simultaneously. For instance, a 4-bit quantized model retains the trustworthiness of its original counterpart, but model pruning significantly degrades trustworthiness, even at 50% sparsity. Moreover, employing quantization within a moderate bit range could unexpectedly improve certain trustworthiness dimensions such as ethics and fairness. Conversely, extreme quantization to very low bit levels (3 bits) tends to significantly reduce trustworthiness. This increased risk cannot be uncovered by looking at benign performance alone, in turn, mandating comprehensive trustworthiness evaluation in practice. These findings culminate in practical recommendations for simultaneously achieving high utility, efficiency, and trustworthiness in LLMs. Models and code are available at https://decoding-comp-trust.github.io/.

압축된 신뢰 해독: 압축된 효율적 LLM의 신뢰성 검토

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

초록

Summary

Support

Support