ARC-Encoder: 大規模言語モデル向け圧縮テキスト表現の学習

要旨

検索拡張生成や連鎖的思考推論などの近年の技術は、より長い文脈と増大する推論コストをもたらしている。文脈圧縮技術はこれらのコストを削減できるが、最も効果的な手法は対象モデルのファインチューニング、あるいはアーキテクチャの変更すら必要とする。これは特定の目的以外で使用する場合、モデルの汎用能力を低下させる可能性がある。本論文では代替手法として、文脈を連続的表現に圧縮し、デコーダLLMにおけるトークン埋め込みと置換するエンコーダを探求する。まず、エンコーダの訓練戦略とアーキテクチャ選択に関する体系的研究を行う。我々の知見に基づき設計されたARC-Encoder（Adaptable text Representations Compressor）は、テキストトークン数よりもx倍（典型的にx∈{4,8}）少ない連続的表現を出力する。命令追従型および基盤モデルデコーダに対し、文脈内学習から文脈ウィンドウ拡張まで様々なLLM利用シナリオでARC-Encoderを評価した。結果、ARC-Encoderは推論時の計算効率を改善しつつ、複数ベンチマークでState-of-the-art性能を達成する。最後に、単一のエンコーダが異なるデコーダLLM間で汎化可能であること、すなわち複数デコーダに同時に適応できることを実証する。これによりARC-Encoderは、複数LLMとシームレスに連携するポータブルエンコーダとして柔軟かつ効率的なソリューションとなる。訓練コードをhttps://github.com/kyutai-labs/ARC-Encoder で、ファインチューニングデータセットと事前学習モデルをhttps://huggingface.co/collections/kyutai/arc-encoders-68ee18787301407d60a57047 で公開する。

English

Recent techniques such as retrieval-augmented generation or chain-of-thought reasoning have led to longer contexts and increased inference costs. Context compression techniques can reduce these costs, but the most effective approaches require fine-tuning the target model or even modifying its architecture. This can degrade its general abilities when not used for this specific purpose. Here we explore an alternative approach: an encoder that compresses the context into continuous representations which replace token embeddings in decoder LLMs. First, we perform a systematic study of training strategies and architecture choices for the encoder. Our findings led to the design of an Adaptable text Representations Compressor, named ARC-Encoder, which outputs x-times fewer continuous representations (typically x!in!{4,8}) than text tokens. We evaluate ARC-Encoder across a variety of LLM usage scenarios, ranging from in-context learning to context window extension, on both instruct and base decoders. Results show that ARC-Encoder achieves state-of-the-art performance on several benchmarks while improving computational efficiency at inference. Finally, we demonstrate that our models can be adapted to multiple decoders simultaneously, allowing a single encoder to generalize across different decoder LLMs. This makes ARC-Encoder a flexible and efficient solution for portable encoders that work seamlessly with multiple LLMs. We release a training code at https://github.com/kyutai-labs/ARC-Encoder , fine-tuning dataset and pretrained models are available at https://huggingface.co/collections/kyutai/arc-encoders-68ee18787301407d60a57047 .

ARC-Encoder: 大規模言語モデル向け圧縮テキスト表現の学習

ARC-Encoder: learning compressed text representations for large language models

要旨

Support