语言建模即为压缩。

摘要

长期以来，人们已经确认预测模型可以转化为无损压缩器，反之亦然。近年来，机器学习社区专注于训练越来越大、更强大的自监督（语言）模型。由于这些大型语言模型展现出令人印象深刻的预测能力，它们具备成为强大压缩器的潜力。在这项工作中，我们主张通过压缩的视角来看待预测问题，并评估大型（基础）模型的压缩能力。我们展示了大型语言模型是强大的通用预测器，并且压缩视角为扩展规律、标记化和上下文学习提供了新颖见解。例如，Chinchilla 70B虽然主要在文本上训练，但将ImageNet补丁压缩到其原始大小的43.4%，将LibriSpeech样本压缩到其原始大小的16.4%，分别超过了领域特定的压缩器如PNG（58.5%）或FLAC（30.3%）。最后，我们展示了预测-压缩等价性使我们能够使用任何压缩器（如gzip）构建条件生成模型。

English

It has long been established that predictive models can be transformed into lossless compressors and vice versa. Incidentally, in recent years, the machine learning community has focused on training increasingly large and powerful self-supervised (language) models. Since these large language models exhibit impressive predictive capabilities, they are well-positioned to be strong compressors. In this work, we advocate for viewing the prediction problem through the lens of compression and evaluate the compression capabilities of large (foundation) models. We show that large language models are powerful general-purpose predictors and that the compression viewpoint provides novel insights into scaling laws, tokenization, and in-context learning. For example, Chinchilla 70B, while trained primarily on text, compresses ImageNet patches to 43.4% and LibriSpeech samples to 16.4% of their raw size, beating domain-specific compressors like PNG (58.5%) or FLAC (30.3%), respectively. Finally, we show that the prediction-compression equivalence allows us to use any compressor (like gzip) to build a conditional generative model.

语言建模即为压缩。

Language Modeling Is Compression

摘要

Support