FocusLLM：通过并行解码扩展LLM的上下文

摘要

赋予LLM利用长上下文中有用信息的能力对许多下游应用至关重要。然而，使用传统的Transformer架构实现长上下文长度需要大量的训练和推理资源。本文提出了FocusLLM，这是一个旨在扩展任何仅具有解码器的LLM上下文长度的框架，使模型能够专注于来自非常长序列的相关信息。FocusLLM通过将长文本输入分成基于模型原始上下文长度的块来处理长文本输入，以减轻注意力分散的问题。然后，它将本地上下文附加到每个块作为提示，基于一种新颖的并行解码机制从每个块中提取关键信息，并最终将提取的信息整合到本地上下文中。FocusLLM在训练效率和多功能性方面脱颖而出：使用8K输入长度进行训练，比以往方法的训练成本要低得多，FocusLLM在处理下游长上下文任务时表现出色，并在处理大量长文本时保持强大的语言建模能力，甚至可达400K标记。我们的代码可在https://github.com/leezythu/FocusLLM找到。

English

Empowering LLMs with the ability to utilize useful information from a long context is crucial for many downstream applications. However, achieving long context lengths with the conventional transformer architecture requires substantial training and inference resources. In this paper, we present FocusLLM, a framework designed to extend the context length of any decoder-only LLM, enabling the model to focus on relevant information from very long sequences. FocusLLM processes long text inputs by dividing them into chunks based on the model's original context length to alleviate the issue of attention distraction. Then, it appends the local context to each chunk as a prompt to extract essential information from each chunk based on a novel parallel decoding mechanism, and ultimately integrates the extracted information into the local context. FocusLLM stands out for great training efficiency and versatility: trained with an 8K input length with much less training cost than previous methods, FocusLLM exhibits superior performance across downstream long-context tasks and maintains strong language modeling ability when handling extensive long texts, even up to 400K tokens. Our code is available at https://github.com/leezythu/FocusLLM.

FocusLLM：通过并行解码扩展LLM的上下文

FocusLLM: Scaling LLM's Context by Parallel Decoding

摘要

Support