FocusLLM：通過平行解碼擴展LLM的上下文

摘要

賦予LLM利用長文本中有用資訊的能力對許多下游應用至關重要。然而，使用傳統的Transformer架構實現長文本長度需要大量的訓練和推理資源。本文提出FocusLLM，這是一個旨在擴展任何僅具解碼器的LLM上下文長度的框架，使模型能夠專注於來自非常長序列的相關資訊。FocusLLM通過將長文本輸入分成基於模型原始上下文長度的塊來處理，以緩解注意力分散的問題。然後，它將本地上下文附加到每個塊，作為提示來提取每個塊中的重要信息，基於一種新穎的平行解碼機制，最終將提取的信息整合到本地上下文中。FocusLLM以出色的訓練效率和多功能性脫穎而出：使用8K輸入長度進行訓練的成本遠低於先前方法，FocusLLM在處理下游長文本任務時表現優異，並在處理廣泛的長文本時保持強大的語言建模能力，甚至達到400K標記。我們的程式碼可在https://github.com/leezythu/FocusLLM找到。

English

Empowering LLMs with the ability to utilize useful information from a long context is crucial for many downstream applications. However, achieving long context lengths with the conventional transformer architecture requires substantial training and inference resources. In this paper, we present FocusLLM, a framework designed to extend the context length of any decoder-only LLM, enabling the model to focus on relevant information from very long sequences. FocusLLM processes long text inputs by dividing them into chunks based on the model's original context length to alleviate the issue of attention distraction. Then, it appends the local context to each chunk as a prompt to extract essential information from each chunk based on a novel parallel decoding mechanism, and ultimately integrates the extracted information into the local context. FocusLLM stands out for great training efficiency and versatility: trained with an 8K input length with much less training cost than previous methods, FocusLLM exhibits superior performance across downstream long-context tasks and maintains strong language modeling ability when handling extensive long texts, even up to 400K tokens. Our code is available at https://github.com/leezythu/FocusLLM.

FocusLLM：通過平行解碼擴展LLM的上下文

FocusLLM: Scaling LLM's Context by Parallel Decoding

摘要

Support