E^2-LLM：大型语言模型的高效和极端长度扩展

摘要

通常，使用长上下文大小训练LLM是计算上昂贵的，需要大量的训练时间和GPU资源。现有的长上下文扩展方法通常需要额外的训练过程来支持相应的长上下文窗口，其中需要长上下文训练数据（例如32k），并假定高GPU训练成本。为了解决上述问题，我们提出了一种用于大型语言模型的高效和极端长度扩展方法，称为E 2 -LLM，只需一个训练过程和大幅降低的计算成本，同时也无需收集长上下文数据。具体来说，首先，我们的E 2 -LLM的训练数据只需要短长度（例如4k），大大降低了调整成本。其次，在短训练上下文窗口上的训练过程只执行一次，我们可以在推断时支持不同的评估上下文窗口。第三，在E 2 -LLM中，基于RoPE位置嵌入，我们为训练中的不同样本的尺度和位置索引参数引入了两种不同的增强方法。旨在使模型在推断时直接插值任意上下文长度时更具鲁棒性。在多个基准数据集上的全面实验结果展示了我们的E 2 -LLM在具有挑战性的长上下文任务上的有效性。

English

Typically, training LLMs with long context sizes is computationally expensive, requiring extensive training hours and GPU resources. Existing long-context extension methods usually need additional training procedures to support corresponding long-context windows, where the long-context training data (e.g., 32k) is needed, and high GPU training costs are assumed. To address the aforementioned issues, we propose an Efficient and Extreme length extension method for Large Language Models, called E 2 -LLM, with only one training procedure and dramatically reduced computation cost, which also removes the need to collect long-context data. Concretely, first, the training data of our E 2 -LLM only requires a short length (e.g., 4k), which reduces the tuning cost greatly. Second, the training procedure on the short training context window is performed only once time, and we can support different evaluation context windows at inference. Third, in E 2 - LLM, based on RoPE position embeddings, we introduce two different augmentation methods on the scale and position index parameters for different samples in training. It aims to make the model more robust to the different relative differences when directly interpolating the arbitrary context length at inference. Comprehensive experimental results on multiple benchmark datasets demonstrate the effectiveness of our E 2 -LLM on challenging long-context tasks.

E^2-LLM：大型语言模型的高效和极端长度扩展

E^2-LLM: Efficient and Extreme Length Extension of Large Language Models

摘要

Support