LongRoPE：将LLM上下文窗口扩展至超过200万个标记

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

February 21, 2024

作者: Yiran Ding, Li Lyna Zhang, Chengruidong Zhang, Yuanyuan Xu, Ning Shang, Jiahang Xu, Fan Yang, Mao Yang

cs.AI

摘要

大的上下文窗口是大型语言模型（LLMs）中一个理想的特性。然而，由于高昂的微调成本、长文本的稀缺性以及新标记位置引入的灾难性数值，目前扩展的上下文窗口仅限于大约128k个标记。本文介绍了LongRoPE，首次将预训练的LLMs的上下文窗口扩展到令人印象深刻的2048k个标记，仅需进行1k次微调步骤，训练长度为256k，同时保持原始短上下文窗口的性能。这是通过三项关键创新实现的：（i）我们识别并利用两种形式的位置插值非均匀性，通过高效搜索提供更好的微调初始化，实现非微调场景下8倍扩展；（ii）我们引入渐进式扩展策略，首先微调256k长度的LLM，然后对微调后的扩展LLM进行第二次位置插值，实现2048k上下文窗口；（iii）我们在8k长度上重新调整LongRoPE，以恢复短上下文窗口性能。在LLaMA2和Mistral上进行的大量实验跨越各种任务，展示了我们方法的有效性。通过LongRoPE扩展的模型保留了原始架构，仅对位置嵌入进行了轻微修改，并且可以重用大部分现有的优化。

English

Large context window is a desirable feature in large language models (LLMs). However, due to high fine-tuning costs, scarcity of long texts, and catastrophic values introduced by new token positions, current extended context windows are limited to around 128k tokens. This paper introduces LongRoPE that, for the first time, extends the context window of pre-trained LLMs to an impressive 2048k tokens, with up to only 1k fine-tuning steps at within 256k training lengths, while maintaining performance at the original short context window. This is achieved by three key innovations: (i) we identify and exploit two forms of non-uniformities in positional interpolation through an efficient search, providing a better initialization for fine-tuning and enabling an 8x extension in non-fine-tuning scenarios; (ii) we introduce a progressive extension strategy that first fine-tunes a 256k length LLM and then conducts a second positional interpolation on the fine-tuned extended LLM to achieve a 2048k context window; (iii) we readjust LongRoPE on 8k length to recover the short context window performance. Extensive experiments on LLaMA2 and Mistral across various tasks demonstrate the effectiveness of our method. Models extended via LongRoPE retain the original architecture with minor modifications to the positional embedding, and can reuse most pre-existing optimizations.

Summary

AI-Generated Summary