FocusLLM:通過平行解碼擴展LLM的上下文
FocusLLM: Scaling LLM's Context by Parallel Decoding
August 21, 2024
作者: Zhenyu Li, Yike Zhang, Tengyu Pan, Yutao Sun, Zhichao Duan, Junjie Fang, Rong Han, Zixuan Wang, Jianyong Wang
cs.AI
摘要
賦予LLM利用長文本中有用資訊的能力對許多下游應用至關重要。然而,使用傳統的Transformer架構實現長文本長度需要大量的訓練和推理資源。本文提出FocusLLM,這是一個旨在擴展任何僅具解碼器的LLM上下文長度的框架,使模型能夠專注於來自非常長序列的相關資訊。FocusLLM通過將長文本輸入分成基於模型原始上下文長度的塊來處理,以緩解注意力分散的問題。然後,它將本地上下文附加到每個塊,作為提示來提取每個塊中的重要信息,基於一種新穎的平行解碼機制,最終將提取的信息整合到本地上下文中。FocusLLM以出色的訓練效率和多功能性脫穎而出:使用8K輸入長度進行訓練的成本遠低於先前方法,FocusLLM在處理下游長文本任務時表現優異,並在處理廣泛的長文本時保持強大的語言建模能力,甚至達到400K標記。我們的程式碼可在https://github.com/leezythu/FocusLLM找到。
English
Empowering LLMs with the ability to utilize useful information from a long
context is crucial for many downstream applications. However, achieving long
context lengths with the conventional transformer architecture requires
substantial training and inference resources. In this paper, we present
FocusLLM, a framework designed to extend the context length of any decoder-only
LLM, enabling the model to focus on relevant information from very long
sequences. FocusLLM processes long text inputs by dividing them into chunks
based on the model's original context length to alleviate the issue of
attention distraction. Then, it appends the local context to each chunk as a
prompt to extract essential information from each chunk based on a novel
parallel decoding mechanism, and ultimately integrates the extracted
information into the local context. FocusLLM stands out for great training
efficiency and versatility: trained with an 8K input length with much less
training cost than previous methods, FocusLLM exhibits superior performance
across downstream long-context tasks and maintains strong language modeling
ability when handling extensive long texts, even up to 400K tokens. Our code is
available at https://github.com/leezythu/FocusLLM.Summary
AI-Generated Summary