言語モデルにおける埋め込み層のスケーリング

要旨

私たちは、入力埋め込み層を拡張して言語モデルの性能を向上させるための方法であるSCONE（Scalable, Contextualized, Offloaded, N-gram Embedding）を提案します。レイヤーサイズが拡大するにつれて、デコードコストの増加を避けるために、SCONEは元の語彙を保持しながら、一連の頻出n-gramの埋め込みを導入します。これらの埋め込みは、各入力トークンの文脈を表現し、トレーニング中に別のモデルで学習されます。推論中には、これらは事前に計算され、アクセラレータメモリに保存され、推論速度への影響が最小限に抑えられます。SCONEは、キャッシュされるn-gram埋め込みの数を増やすことと、それらを学習するモデルをスケーリングすることの両方を可能にし、推論時間のFLOPSを一定に保ちながら、新しいスケーリング戦略を実現します。両側面をスケーリングすることで、SCONEは、多様なコーパスにおいて1.9Bパラメータのベースラインを上回ることができ、推論時間のFLOPSは半分だけで済みます。

English

We propose SCONE (Scalable, Contextualized, Offloaded, N-gram Embedding), a method for extending input embedding layers to enhance language model performance as layer size scales. To avoid increased decoding costs, SCONE retains the original vocabulary while introducing embeddings for a set of frequent n-grams. These embeddings provide contextualized representation for each input token and are learned with a separate model during training. During inference, they are precomputed and stored in off-accelerator memory with minimal impact on inference speed. SCONE enables two new scaling strategies: increasing the number of cached n-gram embeddings and scaling the model used to learn them, all while maintaining fixed inference-time FLOPS. We show that scaling both aspects allows SCONE to outperform a 1.9B parameter baseline across diverse corpora, while using only half the inference-time FLOPS.

言語モデルにおける埋め込み層のスケーリング

Scaling Embedding Layers in Language Models

要旨

Support