YaRN：大型語言模型的高效上下文窗口擴展

摘要

旋轉位置嵌入（RoPE）已被證明能夠有效地在基於Transformer的語言模型中編碼位置信息。然而，這些模型無法泛化到超出它們訓練的序列長度。我們提出了YaRN（Yet another RoPE extensioN method），這是一種計算效率高的方法，可以擴展這些模型的上下文窗口，只需10倍更少的標記和2.5倍更少的訓練步驟。使用YaRN，我們展示LLaMA模型可以有效地利用並推斷出比原始預訓練允許的上下文長得多得多，同時還超越了以前在上下文窗口擴展方面的最新技術。此外，我們展示了YaRN展現了超越微調數據集有限上下文的能力。我們在https://github.com/jquesnelle/yarn上發布了使用64k和128k上下文窗口進行微調的Llama 2 7B/13B的檢查點。

English

Rotary Position Embeddings (RoPE) have been shown to effectively encode positional information in transformer-based language models. However, these models fail to generalize past the sequence length they were trained on. We present YaRN (Yet another RoPE extensioN method), a compute-efficient method to extend the context window of such models, requiring 10x less tokens and 2.5x less training steps than previous methods. Using YaRN, we show that LLaMA models can effectively utilize and extrapolate to context lengths much longer than their original pre-training would allow, while also surpassing previous the state-of-the-art at context window extension. In addition, we demonstrate that YaRN exhibits the capability to extrapolate beyond the limited context of a fine-tuning dataset. We publish the checkpoints of Llama 2 7B/13B fine-tuned using YaRN with 64k and 128k context windows at https://github.com/jquesnelle/yarn

YaRN：大型語言模型的高效上下文窗口擴展

YaRN: Efficient Context Window Extension of Large Language Models

摘要

Support