通过位置插值扩展大型语言模型的上下文窗口
Extending Context Window of Large Language Models via Positional Interpolation
June 27, 2023
作者: Shouyuan Chen, Sherman Wong, Liangjian Chen, Yuandong Tian
cs.AI
摘要
我们提出了位置插值(PI),它扩展了基于RoPE预训练LLM(如LLaMA模型)的上下文窗口大小,最多可达32768,而只需进行最少的微调(在1000步内),同时在需要长上下文的各种任务上展现出强大的实证结果,包括密码检索、语言建模以及从LLaMA 7B到65B的长文档摘要。与此同时,通过位置插值扩展的模型在其原始上下文窗口内的任务上相对保持了质量。为实现这一目标,位置插值线性地将输入位置索引进行缩放,以匹配原始上下文窗口大小,而不是超出训练过的上下文长度进行外推,这可能导致灾难性高的注意力分数,完全破坏自注意力机制。我们的理论研究表明,插值的上限至少比外推小600倍,进一步展示了其稳定性。通过位置插值扩展的模型保留其原始架构,并且可以重复使用大多数现有的优化和基础设施。
English
We present Position Interpolation (PI) that extends the context window sizes
of RoPE-based pretrained LLMs such as LLaMA models to up to 32768 with minimal
fine-tuning (within 1000 steps), while demonstrating strong empirical results
on various tasks that require long context, including passkey retrieval,
language modeling, and long document summarization from LLaMA 7B to 65B.
Meanwhile, the extended model by Position Interpolation preserve quality
relatively well on tasks within its original context window. To achieve this
goal, Position Interpolation linearly down-scales the input position indices to
match the original context window size, rather than extrapolating beyond the
trained context length which may lead to catastrophically high attention scores
that completely ruin the self-attention mechanism. Our theoretical study shows
that the upper bound of interpolation is at least sim 600 times smaller
than that of extrapolation, further demonstrating its stability. Models
extended via Position Interpolation retain its original architecture and can
reuse most pre-existing optimization and infrastructure.