将LLMs的上下文窗口扩展100个样本

摘要

大型语言模型（LLMs）众所周知在其预训练上下文窗口之外具有有限的外推能力，限制了它们在具有冗长输入的下游任务中的应用。最近的研究试图通过修改旋转位置嵌入（RoPE）来扩展LLMs的上下文窗口，RoPE是一种流行的位置编码方法，被广泛采用于知名的LLMs，如LLaMA、PaLM和GPT-NeoX。然而，之前的作品如位置插值（PI）和YaRN耗费资源且缺乏比较实验来评估它们的适用性。在这项工作中，我们确定了LLMs的注意熵（即注意力分数的信息熵）保持稳定的固有需求，并引入了一种新颖的RoPE扩展方法，结合调整RoPE的基频率和缩放注意力logits，以帮助LLMs有效地适应更大的上下文窗口。我们验证了我们的方法在各种对上下文要求严格的任务中，在不同上下文窗口大小下的微调性能和鲁棒性的优越性。值得注意的是，我们的方法将LLaMA-2-7B-Chat的上下文窗口扩展到16,384，仅需100个样本和6个训练步骤，展示了非凡的效率。最后，我们还探讨了数据组成和训练课程如何影响特定下游任务的上下文窗口扩展，建议以冗长对话微调LLMs作为一个良好的起点。我们在https://github.com/GAIR-NLP/Entropy-ABF发布了我们的代码和SFT数据。

English

Large Language Models (LLMs) are known to have limited extrapolation ability beyond their pre-trained context window, constraining their application in downstream tasks with lengthy inputs. Recent studies have sought to extend LLMs' context window by modifying rotary position embedding (RoPE), a popular position encoding method adopted by well-known LLMs such as LLaMA, PaLM, and GPT-NeoX. However, prior works like Position Interpolation (PI) and YaRN are resource-intensive and lack comparative experiments to assess their applicability. In this work, we identify the inherent need for LLMs' attention entropy (i.e. the information entropy of attention scores) to maintain stability and introduce a novel extension to RoPE which combines adjusting RoPE's base frequency and scaling the attention logits to help LLMs efficiently adapt to a larger context window. We validate the superiority of our method in both fine-tuning performance and robustness across different context window sizes on various context-demanding tasks. Notably, our method extends the context window of LLaMA-2-7B-Chat to 16,384 with only 100 samples and 6 training steps, showcasing extraordinary efficiency. Finally, we also explore how data compositions and training curricula affect context window extension for specific downstream tasks, suggesting fine-tuning LLMs with lengthy conversations as a good starting point. We release our code and SFT data at https://github.com/GAIR-NLP/Entropy-ABF.

将LLMs的上下文窗口扩展100个样本

Extending LLMs' Context Window with 100 Samples

摘要

Support