LLoCO：離線學習長文本

摘要

處理長文本對於大型語言模型(LLMs)來說仍然是一個挑戰，這是因為自注意機制的計算和記憶體開銷是二次的，並且在生成過程中需要大量的KV快取空間。我們提出了一種新方法來解決這個問題，通過通過上下文壓縮和在領域內參數高效微調來離線學習上下文。我們的方法使LLM能夠創建原始上下文的簡潔表示，並有效檢索相關信息以準確回答問題。我們引入了LLoCO，這是一種結合了上下文壓縮、檢索和使用LoRA進行參數高效微調的技術。我們的方法擴展了4k令牌LLaMA2-7B模型的有效上下文窗口，使其能夠處理多達128k令牌。我們在幾個長文本問答數據集上評估了我們的方法，結果顯示LLoCO在推論過程中使用的令牌數量比上下文學習少了30倍，性能顯著優於上下文學習。LLoCO實現了高達7.62倍的加速，大幅降低了長文檔問答的成本，使其成為處理長文本高效的有前途的解決方案。我們的代碼公開在https://github.com/jeffreysijuntan/lloco。

English

Processing long contexts remains a challenge for large language models (LLMs) due to the quadratic computational and memory overhead of the self-attention mechanism and the substantial KV cache sizes during generation. We propose a novel approach to address this problem by learning contexts offline through context compression and in-domain parameter-efficient finetuning. Our method enables an LLM to create a concise representation of the original context and efficiently retrieve relevant information to answer questions accurately. We introduce LLoCO, a technique that combines context compression, retrieval, and parameter-efficient finetuning using LoRA. Our approach extends the effective context window of a 4k token LLaMA2-7B model to handle up to 128k tokens. We evaluate our approach on several long-context question-answering datasets, demonstrating that LLoCO significantly outperforms in-context learning while using 30times fewer tokens during inference. LLoCO achieves up to 7.62times speed-up and substantially reduces the cost of long document question answering, making it a promising solution for efficient long context processing. Our code is publicly available at https://github.com/jeffreysijuntan/lloco.

LLoCO：離線學習長文本

LLoCO: Learning Long Contexts Offline

摘要

Support