圧縮された信頼の解読：圧縮下における効率的な大規模言語モデルの信頼性の検証

要旨

高能力な大規模言語モデル（LLMs）の圧縮は、リソース効率の良い推論を実現するための有力な戦略として注目を集めている。最先端（SoTA）の圧縮手法は、良性タスクの性能を維持する点で目覚ましい進歩を遂げているが、安全性と信頼性の観点から見た圧縮の潜在的なリスクは、これまでほとんど無視されてきた。本研究では、3つの主要なLLMsに対して、5つのSoTA圧縮技術を用いて、8つの信頼性次元にわたる初の徹底的な評価を行った。実験結果からは、圧縮と信頼性の間の複雑な相互作用が明らかになり、いくつかの興味深いパターンが浮かび上がった。量子化は現在、効率性と信頼性を同時に達成する点で、プルーニングよりも有効なアプローチであることがわかった。例えば、4ビットの量子化モデルは、元のモデルの信頼性を維持するが、モデルのプルーニングは、50%のスパース性であっても信頼性を著しく低下させる。さらに、適度なビット範囲内での量子化を採用すると、倫理や公平性といった特定の信頼性次元が予期せず向上する可能性がある。逆に、非常に低いビットレベル（3ビット）への極端な量子化は、信頼性を大幅に低下させる傾向がある。このようなリスクの増大は、良性性能だけを見ていても明らかにならないため、実践においては包括的な信頼性評価が必須である。これらの知見は、LLMsにおいて高い有用性、効率性、信頼性を同時に達成するための実践的な提言に結びついている。モデルとコードはhttps://decoding-comp-trust.github.io/で公開されている。

English

Compressing high-capability Large Language Models (LLMs) has emerged as a favored strategy for resource-efficient inferences. While state-of-the-art (SoTA) compression methods boast impressive advancements in preserving benign task performance, the potential risks of compression in terms of safety and trustworthiness have been largely neglected. This study conducts the first, thorough evaluation of three (3) leading LLMs using five (5) SoTA compression techniques across eight (8) trustworthiness dimensions. Our experiments highlight the intricate interplay between compression and trustworthiness, revealing some interesting patterns. We find that quantization is currently a more effective approach than pruning in achieving efficiency and trustworthiness simultaneously. For instance, a 4-bit quantized model retains the trustworthiness of its original counterpart, but model pruning significantly degrades trustworthiness, even at 50% sparsity. Moreover, employing quantization within a moderate bit range could unexpectedly improve certain trustworthiness dimensions such as ethics and fairness. Conversely, extreme quantization to very low bit levels (3 bits) tends to significantly reduce trustworthiness. This increased risk cannot be uncovered by looking at benign performance alone, in turn, mandating comprehensive trustworthiness evaluation in practice. These findings culminate in practical recommendations for simultaneously achieving high utility, efficiency, and trustworthiness in LLMs. Models and code are available at https://decoding-comp-trust.github.io/.

圧縮された信頼の解読：圧縮下における効率的な大規模言語モデルの信頼性の検証

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

要旨

Support