1568トークンを単一ベクトルに詰め込み、再び戻す：埋め込み空間容量の限界を探る

要旨

最近の一連の研究では、トークンのシーケンスをより短い実数値ベクトルのシーケンスに圧縮し、トークン埋め込みやキー値キャッシュの代わりに入力として使用する問題に取り組んでいます。これらのアプローチにより、既存の言語モデルにおける計算量を削減することが可能です。強力なモデルをエンコーダーとして使用しているにもかかわらず、達成可能な最大の無損失圧縮率は通常x10を超えません。この事実は非常に興味深いものです。なぜなら、理論的には、16ビット精度と控えめなベクトルサイズであっても、大きな実数値ベクトルの最大情報容量は提示されたレートをはるかに超えているからです。本研究では、エンコーダーをサンプルごとの最適化手順に置き換えることで、圧縮の限界を探ります。x1500までの圧縮率を持つベクトルが存在することを示し、既存の解決策と実現可能な解決策との間に2桁のギャップがあることを明らかにします。さらに、圧縮の限界は入力の長さではなく、削減すべき不確実性の量、すなわち、何の条件付けもないこのシーケンスにおけるクロスエントロピー損失によって決定されることを実証的に示します。得られた限界は、入力埋め込みの理論的な容量とその実際の利用との間に大きなギャップがあることを強調し、モデル設計における最適化の余地が大きいことを示唆しています。

English

A range of recent works addresses the problem of compression of sequence of tokens into a shorter sequence of real-valued vectors to be used as inputs instead of token embeddings or key-value cache. These approaches allow to reduce the amount of compute in existing language models. Despite relying on powerful models as encoders, the maximum attainable lossless compression ratio is typically not higher than x10. This fact is highly intriguing because, in theory, the maximum information capacity of large real-valued vectors is far beyond the presented rates even for 16-bit precision and a modest vector size. In this work, we explore the limits of compression by replacing the encoder with a per-sample optimization procedure. We show that vectors with compression ratios up to x1500 exist, which highlights two orders of magnitude gap between existing and practically attainable solutions. Furthermore, we empirically show that the compression limits are determined not by the length of the input but by the amount of uncertainty to be reduced, namely, the cross-entropy loss on this sequence without any conditioning. The obtained limits highlight the substantial gap between the theoretical capacity of input embeddings and their practical utilization, suggesting significant room for optimization in model design.

1568トークンを単一ベクトルに詰め込み、再び戻す：埋め込み空間容量の限界を探る

Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity

要旨

Support