大型語言模型中的上下文壓縮用途自編碼器

摘要

我們提出了「上下文自編碼器」（ICAE）來進行大型語言模型（LLM）中的上下文壓縮。ICAE包含兩個模組：一個可學習的編碼器，使用從LLM中適應的LoRA來將長上下文壓縮為有限數量的記憶槽，以及一個固定的解碼器，即目標LLM，可以根據記憶槽進行各種目的的條件設定。我們首先對ICAE進行預訓練，使用自編碼和語言建模目標在大量文本數據上，使其能夠生成準確全面地代表原始上下文的記憶槽。然後，我們在少量指導數據上對預訓練的ICAE進行微調，以增強其與各種提示的互動，以生成理想的回應。我們的實驗結果表明，使用我們提出的預訓練和微調範式學習的ICAE能夠有效地產生具有4倍上下文壓縮的記憶槽，目標LLM可以很好地對其進行條件設定，以回應各種提示。這些令人鼓舞的結果顯示ICAE對於長上下文問題的新方法以及在實踐中減少LLM推理的計算和記憶體開銷的潛力具有重要意義，建議在LLM的上下文管理方面進一步進行研究。我們的代碼和數據將很快發布。

English

We propose the In-context Autoencoder (ICAE) for context compression in a large language model (LLM). The ICAE has two modules: a learnable encoder adapted with LoRA from an LLM for compressing a long context into a limited number of memory slots, and a fixed decoder which is the target LLM that can condition on the memory slots for various purposes. We first pretrain the ICAE using both autoencoding and language modeling objectives on massive text data, enabling it to generate memory slots that accurately and comprehensively represent the original context. Then, we fine-tune the pretrained ICAE on a small amount of instruct data to enhance its interaction with various prompts for producing desirable responses. Our experimental results demonstrate that the ICAE learned with our proposed pretraining and fine-tuning paradigm can effectively produce memory slots with 4times context compression, which can be well conditioned on by the target LLM to respond to various prompts. The promising results demonstrate significant implications of the ICAE for its novel approach to the long context problem and its potential to reduce computation and memory overheads for LLM inference in practice, suggesting further research effort in context management for an LLM. Our code and data will be released shortly.

大型語言模型中的上下文壓縮用途自編碼器

In-context Autoencoder for Context Compression in a Large Language Model

摘要

Support