LLoCO:離線學習長文本
LLoCO: Learning Long Contexts Offline
April 11, 2024
作者: Sijun Tan, Xiuyu Li, Shishir Patil, Ziyang Wu, Tianjun Zhang, Kurt Keutzer, Joseph E. Gonzalez, Raluca Ada Popa
cs.AI
摘要
處理長文本對於大型語言模型(LLMs)來說仍然是一個挑戰,這是因為自注意機制的計算和記憶體開銷是二次的,並且在生成過程中需要大量的KV快取空間。我們提出了一種新方法來解決這個問題,通過通過上下文壓縮和在領域內參數高效微調來離線學習上下文。我們的方法使LLM能夠創建原始上下文的簡潔表示,並有效檢索相關信息以準確回答問題。我們引入了LLoCO,這是一種結合了上下文壓縮、檢索和使用LoRA進行參數高效微調的技術。我們的方法擴展了4k令牌LLaMA2-7B模型的有效上下文窗口,使其能夠處理多達128k令牌。我們在幾個長文本問答數據集上評估了我們的方法,結果顯示LLoCO在推論過程中使用的令牌數量比上下文學習少了30倍,性能顯著優於上下文學習。LLoCO實現了高達7.62倍的加速,大幅降低了長文檔問答的成本,使其成為處理長文本高效的有前途的解決方案。我們的代碼公開在https://github.com/jeffreysijuntan/lloco。
English
Processing long contexts remains a challenge for large language models (LLMs)
due to the quadratic computational and memory overhead of the self-attention
mechanism and the substantial KV cache sizes during generation. We propose a
novel approach to address this problem by learning contexts offline through
context compression and in-domain parameter-efficient finetuning. Our method
enables an LLM to create a concise representation of the original context and
efficiently retrieve relevant information to answer questions accurately. We
introduce LLoCO, a technique that combines context compression, retrieval, and
parameter-efficient finetuning using LoRA. Our approach extends the effective
context window of a 4k token LLaMA2-7B model to handle up to 128k tokens. We
evaluate our approach on several long-context question-answering datasets,
demonstrating that LLoCO significantly outperforms in-context learning while
using 30times fewer tokens during inference. LLoCO achieves up to
7.62times speed-up and substantially reduces the cost of long document
question answering, making it a promising solution for efficient long context
processing. Our code is publicly available at
https://github.com/jeffreysijuntan/lloco.Summary
AI-Generated Summary