通过矩阵核范数对大型语言模型进行评估。

摘要

随着大型语言模型（LLMs）的不断发展，高效的评估指标对于评估它们压缩信息和减少冗余的能力至关重要。虽然传统的度量标准如矩阵熵提供了有价值的见解，但由于其具有奇异值分解（SVD）的 \( O(n^3) \) 时间复杂度，对于大规模模型而言计算密集。为了缓解这一问题，我们引入了矩阵核范数，它不仅作为衡量LLM数据压缩能力的度量标准，还提供了矩阵秩的凸逼近，以捕捉预测可辨识性和多样性。通过进一步逼近核范数的 \( L_{1,2}-norm \)，我们可以有效评估模型的信息压缩能力。这种方法将时间复杂度降低到 \( O(n^2) \)，并消除了对SVD计算的需求。因此，相较于矩阵熵，矩阵核范数在CEREBRAS-GPT模型的111M至6.7B尺寸增加时，实现了8至24倍的速度提升。随着模型规模的增大，这种性能差距变得更加显著，这在与其他模型如Pythia的测试中得到验证。此外，基准测试和模型响应的评估证实了我们提出的矩阵核范数是一种可靠、可扩展且高效的工具，用于评估LLMs的性能，实现了准确性和计算效率之间的平衡。代码可在 https://github.com/MLGroupJLU/MatrixNuclearNorm 获取。

English

As large language models (LLMs) continue to evolve, efficient evaluation metrics are vital for assessing their ability to compress information and reduce redundancy. While traditional metrics like Matrix Entropy offer valuable insights, they are computationally intensive for large-scale models due to their \( O(n^3) \) time complexity with Singular Value Decomposition (SVD). To mitigate this issue, we introduce the Matrix Nuclear-Norm, which not only serves as a metric to quantify the data compression proficiency of LLM but also provides a convex approximation of matrix rank to capture both predictive discriminability and diversity. By employing the \( L_{1,2}-norm \) to further approximate the nuclear norm, we can effectively assess the model's information compression capabilities. This approach reduces the time complexity to \( O(n^2) \) and eliminates the need for SVD computation. Consequently, the Matrix Nuclear-Norm achieves speeds 8 to 24 times faster than Matrix Entropy for the CEREBRAS-GPT model as sizes increase from 111M to 6.7B. This performance gap becomes more pronounced with larger models, as validated in tests with other models like Pythia. Additionally, evaluations on benchmarks and model responses confirm that our proposed Matrix Nuclear-Norm is a reliable, scalable, and efficient tool for assessing LLMs' performance, striking a balance between accuracy and computational efficiency. The code is available at https://github.com/MLGroupJLU/MatrixNuclearNorm.

通过矩阵核范数对大型语言模型进行评估。

Large Language Model Evaluation via Matrix Nuclear-Norm

摘要

Support