YaRN:大型语言模型的高效上下文窗口扩展
YaRN: Efficient Context Window Extension of Large Language Models
August 31, 2023
作者: Bowen Peng, Jeffrey Quesnelle, Honglu Fan, Enrico Shippole
cs.AI
摘要
旋转位置嵌入(RoPE)已被证明能够有效地在基于Transformer的语言模型中编码位置信息。然而,这些模型在无法泛化超出它们训练时的序列长度。我们提出了YaRN(另一种RoPE扩展方法),这是一种计算高效的方法,可以扩展这些模型的上下文窗口,只需要10倍的标记和2.5倍的训练步骤,比先前的方法少。使用YaRN,我们展示LLaMA模型可以有效地利用和推断出比其原始预训练允许的上下文长度长得多得多,同时也超过了以前的上下文窗口扩展的最新技术水平。此外,我们展示了YaRN具有超越微调数据集有限上下文的能力。我们发布了使用64k和128k上下文窗口进行YaRN微调的Llama 2 7B/13B的检查点,网址为https://github.com/jquesnelle/yarn
English
Rotary Position Embeddings (RoPE) have been shown to effectively encode
positional information in transformer-based language models. However, these
models fail to generalize past the sequence length they were trained on. We
present YaRN (Yet another RoPE extensioN method), a compute-efficient method to
extend the context window of such models, requiring 10x less tokens and 2.5x
less training steps than previous methods. Using YaRN, we show that LLaMA
models can effectively utilize and extrapolate to context lengths much longer
than their original pre-training would allow, while also surpassing previous
the state-of-the-art at context window extension. In addition, we demonstrate
that YaRN exhibits the capability to extrapolate beyond the limited context of
a fine-tuning dataset. We publish the checkpoints of Llama 2 7B/13B fine-tuned
using YaRN with 64k and 128k context windows at
https://github.com/jquesnelle/yarn