E^2-LLM：大型語言模型的高效和極端長度擴展

摘要

通常，使用長上下文大小訓練LLM具有高計算成本，需要大量的訓練時間和GPU資源。現有的長上下文擴展方法通常需要額外的訓練程序來支持相應的長上下文窗口，其中需要長上下文訓練數據（例如32k），並且假設高GPU訓練成本。為了解決上述問題，我們提出了一種名為E 2 -LLM的大型語言模型的高效和極端長度擴展方法，僅需一個訓練程序並大幅降低計算成本，同時也無需收集長上下文數據。具體而言，首先，我們的E 2 -LLM的訓練數據僅需要較短的長度（例如4k），大大降低了調整成本。其次，在短訓練上下文窗口上的訓練程序僅執行一次，我們可以支持不同的評估上下文窗口進行推斷。第三，在E 2 -LLM中，基於RoPE位置嵌入，我們引入了兩種不同的增強方法，針對訓練中不同樣本的尺度和位置索引參數。這旨在使模型在推斷時直接插值任意上下文長度時更具韌性。對多個基準數據集的全面實驗結果證明了我們的E 2 -LLM在具有挑戰性的長上下文任務上的有效性。

English

Typically, training LLMs with long context sizes is computationally expensive, requiring extensive training hours and GPU resources. Existing long-context extension methods usually need additional training procedures to support corresponding long-context windows, where the long-context training data (e.g., 32k) is needed, and high GPU training costs are assumed. To address the aforementioned issues, we propose an Efficient and Extreme length extension method for Large Language Models, called E 2 -LLM, with only one training procedure and dramatically reduced computation cost, which also removes the need to collect long-context data. Concretely, first, the training data of our E 2 -LLM only requires a short length (e.g., 4k), which reduces the tuning cost greatly. Second, the training procedure on the short training context window is performed only once time, and we can support different evaluation context windows at inference. Third, in E 2 - LLM, based on RoPE position embeddings, we introduce two different augmentation methods on the scale and position index parameters for different samples in training. It aims to make the model more robust to the different relative differences when directly interpolating the arbitrary context length at inference. Comprehensive experimental results on multiple benchmark datasets demonstrate the effectiveness of our E 2 -LLM on challenging long-context tasks.

E^2-LLM：大型語言模型的高效和極端長度擴展

E^2-LLM: Efficient and Extreme Length Extension of Large Language Models

摘要

Support