壓縮線性地代表智能。
Compression Represents Intelligence Linearly
April 15, 2024
作者: Yuzhen Huang, Jinghan Zhang, Zifei Shan, Junxian He
cs.AI
摘要
有一种观念认为,精通压缩将导致智能。
最近,语言建模被证明等同于压缩,
这为大型语言模型(LLMs)的成功提供了令人信服的理由:
更先进的语言模型的发展基本上是增强了压缩,从而促进了智能。尽管存在这样引人入胜的讨论,但很少有实证证据表明压缩与智能之间的相互作用。在这项工作中,我们在LLMs的背景下研究它们之间的关系,将LLMs视为数据压缩器。鉴于“智能”这个抽象概念,我们采用下游基准测试分数的平均值作为替代指标,具体针对与知识和常识相关的智能、编码以及数学推理。在12个基准测试中,我们的研究汇集了来自不同组织的30个公共LLMs。值得注意的是,我们发现LLMs的智能——通过平均基准测试分数反映——几乎与它们压缩外部文本语料库的能力呈线性相关。这些结果提供了具体证据,支持了优越的压缩表明更高智能的信念。此外,我们的发现表明,作为从原始文本语料库中导出的无监督度量,压缩效率作为一个可靠的评估指标,与模型能力呈线性关联。我们开源了我们的压缩数据集以及我们的数据收集管道,以便未来的研究人员能够适当评估压缩。
English
There is a belief that learning to compress well will lead to intelligence.
Recently, language modeling has been shown to be equivalent to compression,
which offers a compelling rationale for the success of large language models
(LLMs): the development of more advanced language models is essentially
enhancing compression which facilitates intelligence. Despite such appealing
discussions, little empirical evidence is present for the interplay between
compression and intelligence. In this work, we examine their relationship in
the context of LLMs, treating LLMs as data compressors. Given the abstract
concept of "intelligence", we adopt the average downstream benchmark scores as
a surrogate, specifically targeting intelligence related to knowledge and
commonsense, coding, and mathematical reasoning. Across 12 benchmarks, our
study brings together 30 public LLMs that originate from diverse organizations.
Remarkably, we find that LLMs' intelligence -- reflected by average benchmark
scores -- almost linearly correlates with their ability to compress external
text corpora. These results provide concrete evidence supporting the belief
that superior compression indicates greater intelligence. Furthermore, our
findings suggest that compression efficiency, as an unsupervised metric derived
from raw text corpora, serves as a reliable evaluation measure that is linearly
associated with the model capabilities. We open-source our compression datasets
as well as our data collection pipelines to facilitate future researchers to
assess compression properly.Summary
AI-Generated Summary