ChatPaper.aiChatPaper

語言建模即為壓縮。

Language Modeling Is Compression

September 19, 2023
作者: Grégoire Delétang, Anian Ruoss, Paul-Ambroise Duquenne, Elliot Catt, Tim Genewein, Christopher Mattern, Jordi Grau-Moya, Li Kevin Wenliang, Matthew Aitchison, Laurent Orseau, Marcus Hutter, Joel Veness
cs.AI

摘要

長久以來,人們已經確立了預測模型可以轉換為無損壓縮器,反之亦然。近年來,機器學習社群專注於訓練越來越大且強大的自監督(語言)模型。由於這些大型語言模型展現出令人印象深刻的預測能力,它們很適合成為強大的壓縮器。在這項研究中,我們主張通過壓縮的角度來看待預測問題,並評估大型(基礎)模型的壓縮能力。我們展示了大型語言模型是功能強大的通用預測器,並且壓縮觀點提供了對於規模定律、標記化和上下文學習的新見解。例如,Chinchilla 70B 雖然主要在文本上訓練,但將 ImageNet 補丁壓縮至其原始大小的 43.4%,以及 LibriSpeech 樣本壓縮至其原始大小的 16.4%,分別超越了領域特定的壓縮器如 PNG(58.5%)或 FLAC(30.3%)。最後,我們展示了預測-壓縮等價性使我們能夠使用任何壓縮器(如 gzip)來建立條件生成模型。
English
It has long been established that predictive models can be transformed into lossless compressors and vice versa. Incidentally, in recent years, the machine learning community has focused on training increasingly large and powerful self-supervised (language) models. Since these large language models exhibit impressive predictive capabilities, they are well-positioned to be strong compressors. In this work, we advocate for viewing the prediction problem through the lens of compression and evaluate the compression capabilities of large (foundation) models. We show that large language models are powerful general-purpose predictors and that the compression viewpoint provides novel insights into scaling laws, tokenization, and in-context learning. For example, Chinchilla 70B, while trained primarily on text, compresses ImageNet patches to 43.4% and LibriSpeech samples to 16.4% of their raw size, beating domain-specific compressors like PNG (58.5%) or FLAC (30.3%), respectively. Finally, we show that the prediction-compression equivalence allows us to use any compressor (like gzip) to build a conditional generative model.
PDF837December 15, 2024