解码压缩信任：审查压缩下高效LLM的可信度

摘要

对高性能大型语言模型（LLMs）进行压缩已成为资源高效推理的首选策略。尽管最先进的压缩方法在保留良好任务性能方面取得了令人瞩目的进展，但在安全性和可信度方面的潜在风险却被大多数忽视。本研究首次对三（3）种主要LLMs使用五（5）种最先进的压缩技术在八（8）个可信度维度上进行了彻底评估。我们的实验突显了压缩与可信度之间复杂的相互作用，揭示了一些有趣的模式。我们发现，目前量化比剪枝更有效地实现了效率和可信度的同时提升。例如，一个4位量化模型保留了其原始对应模型的可信度，但模型剪枝显著降低了可信度，即使在50%的稀疏度下也是如此。此外，在适度的位范围内使用量化可能会意外地提高某些可信度维度，如道德和公平性。相反，将量化极端化至非常低的位级（3位）往往会显著降低可信度。这种增加的风险不能仅通过查看良好性能来揭示，因此实践中需要进行全面的可信度评估。这些发现为在LLMs中同时实现高效用、效率和可信度提供了实用建议。模型和代码可在https://decoding-comp-trust.github.io/ 上获得。

English

Compressing high-capability Large Language Models (LLMs) has emerged as a favored strategy for resource-efficient inferences. While state-of-the-art (SoTA) compression methods boast impressive advancements in preserving benign task performance, the potential risks of compression in terms of safety and trustworthiness have been largely neglected. This study conducts the first, thorough evaluation of three (3) leading LLMs using five (5) SoTA compression techniques across eight (8) trustworthiness dimensions. Our experiments highlight the intricate interplay between compression and trustworthiness, revealing some interesting patterns. We find that quantization is currently a more effective approach than pruning in achieving efficiency and trustworthiness simultaneously. For instance, a 4-bit quantized model retains the trustworthiness of its original counterpart, but model pruning significantly degrades trustworthiness, even at 50% sparsity. Moreover, employing quantization within a moderate bit range could unexpectedly improve certain trustworthiness dimensions such as ethics and fairness. Conversely, extreme quantization to very low bit levels (3 bits) tends to significantly reduce trustworthiness. This increased risk cannot be uncovered by looking at benign performance alone, in turn, mandating comprehensive trustworthiness evaluation in practice. These findings culminate in practical recommendations for simultaneously achieving high utility, efficiency, and trustworthiness in LLMs. Models and code are available at https://decoding-comp-trust.github.io/.

解码压缩信任：审查压缩下高效LLM的可信度

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

摘要

Support