E^2-LLM:大型語言模型的高效和極端長度擴展
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
January 13, 2024
作者: Jiaheng Liu, Zhiqi Bai, Yuanxing Zhang, Chenchen Zhang, Yu Zhang, Ge Zhang, Jiakai Wang, Haoran Que, Yukang Chen, Wenbo Su, Tiezheng Ge, Jie Fu, Wenhu Chen, Bo Zheng
cs.AI
摘要
通常,使用長上下文大小訓練LLM具有高計算成本,需要大量的訓練時間和GPU資源。現有的長上下文擴展方法通常需要額外的訓練程序來支持相應的長上下文窗口,其中需要長上下文訓練數據(例如32k),並且假設高GPU訓練成本。為了解決上述問題,我們提出了一種名為E 2 -LLM的大型語言模型的高效和極端長度擴展方法,僅需一個訓練程序並大幅降低計算成本,同時也無需收集長上下文數據。具體而言,首先,我們的E 2 -LLM的訓練數據僅需要較短的長度(例如4k),大大降低了調整成本。其次,在短訓練上下文窗口上的訓練程序僅執行一次,我們可以支持不同的評估上下文窗口進行推斷。第三,在E 2 -LLM中,基於RoPE位置嵌入,我們引入了兩種不同的增強方法,針對訓練中不同樣本的尺度和位置索引參數。這旨在使模型在推斷時直接插值任意上下文長度時更具韌性。對多個基準數據集的全面實驗結果證明了我們的E 2 -LLM在具有挑戰性的長上下文任務上的有效性。
English
Typically, training LLMs with long context sizes is computationally
expensive, requiring extensive training hours and GPU resources. Existing
long-context extension methods usually need additional training procedures to
support corresponding long-context windows, where the long-context training
data (e.g., 32k) is needed, and high GPU training costs are assumed. To address
the aforementioned issues, we propose an Efficient and Extreme length extension
method for Large Language Models, called E 2 -LLM, with only one training
procedure and dramatically reduced computation cost, which also removes the
need to collect long-context data. Concretely, first, the training data of our
E 2 -LLM only requires a short length (e.g., 4k), which reduces the tuning cost
greatly. Second, the training procedure on the short training context window is
performed only once time, and we can support different evaluation context
windows at inference. Third, in E 2 - LLM, based on RoPE position embeddings,
we introduce two different augmentation methods on the scale and position index
parameters for different samples in training. It aims to make the model more
robust to the different relative differences when directly interpolating the
arbitrary context length at inference. Comprehensive experimental results on
multiple benchmark datasets demonstrate the effectiveness of our E 2 -LLM on
challenging long-context tasks.