E^2-LLM: 大規模言語モデルの効率的かつ極限的な長文拡張

要旨

一般的に、長いコンテキストサイズで大規模言語モデル（LLM）を訓練することは計算コストが高く、長時間の訓練と大量のGPUリソースを必要とします。既存の長文脈拡張手法では、通常、対応する長文脈ウィンドウをサポートするために追加の訓練プロセスが必要であり、長文脈訓練データ（例：32k）が要求され、高いGPU訓練コストが想定されます。これらの課題を解決するため、我々はEfficient and Extreme length extension method for Large Language Models（E 2 -LLM）を提案します。この手法は、たった1回の訓練プロセスで大幅に計算コストを削減し、長文脈データの収集も不要とします。具体的には、まず、E 2 -LLMの訓練データは短い長さ（例：4k）のみを必要とし、これによりチューニングコストが大幅に削減されます。次に、短い訓練コンテキストウィンドウでの訓練プロセスは1回のみ実行され、推論時には異なる評価コンテキストウィンドウをサポートできます。さらに、E 2 -LLMでは、RoPE位置埋め込みに基づき、訓練中の異なるサンプルに対してスケールと位置インデックスパラメータに2つの異なる拡張手法を導入します。これにより、推論時に任意のコンテキスト長を直接補間する際に、異なる相対的差異に対してモデルをより頑健にすることが目的です。複数のベンチマークデータセットでの包括的な実験結果は、E 2 -LLMが挑戦的な長文脈タスクにおいて有効であることを示しています。

English

Typically, training LLMs with long context sizes is computationally expensive, requiring extensive training hours and GPU resources. Existing long-context extension methods usually need additional training procedures to support corresponding long-context windows, where the long-context training data (e.g., 32k) is needed, and high GPU training costs are assumed. To address the aforementioned issues, we propose an Efficient and Extreme length extension method for Large Language Models, called E 2 -LLM, with only one training procedure and dramatically reduced computation cost, which also removes the need to collect long-context data. Concretely, first, the training data of our E 2 -LLM only requires a short length (e.g., 4k), which reduces the tuning cost greatly. Second, the training procedure on the short training context window is performed only once time, and we can support different evaluation context windows at inference. Third, in E 2 - LLM, based on RoPE position embeddings, we introduce two different augmentation methods on the scale and position index parameters for different samples in training. It aims to make the model more robust to the different relative differences when directly interpolating the arbitrary context length at inference. Comprehensive experimental results on multiple benchmark datasets demonstrate the effectiveness of our E 2 -LLM on challenging long-context tasks.

E^2-LLM: 大規模言語モデルの効率的かつ極限的な長文拡張

E^2-LLM: Efficient and Extreme Length Extension of Large Language Models

要旨

Support