InfiniPot: メモリ制約のあるLLM上での無限コンテキスト処理

要旨

長い入力コンテキストの取り扱いは、特にモバイルデバイスなどのリソース制約の厳しい環境において、大規模言語モデル（LLMs）にとって重要な課題です。私たちの研究は、追加のトレーニングを必要とせずに、事前にトレーニングされたLLMsが固定されたメモリ制約内で効率的に広範なシーケンスを管理することを可能にする新しいKVキャッシュ制御フレームワークであるInfiniPotを導入することで、この制約に対処することを目的としています。InfiniPotは、新しい重要度メトリクスを介して重要な情報を圧縮および保持する反復プロセスであるContinual Context Distillation（CCD）を活用し、将来のコンテキストへのアクセスがなくても重要なデータを効果的に維持します。私たちの包括的な評価によると、InfiniPotは、さまざまなNLPタスクで長いコンテキストにトレーニングされたモデルを大幅に上回り、その有効性と汎用性を確立しています。この研究は、LLMsを幅広い実世界シナリオに適用可能にするための重要な進歩を表しています。

English

Handling long input contexts remains a significant challenge for Large Language Models (LLMs), particularly in resource-constrained environments such as mobile devices. Our work aims to address this limitation by introducing InfiniPot, a novel KV cache control framework designed to enable pre-trained LLMs to manage extensive sequences within fixed memory constraints efficiently, without requiring additional training. InfiniPot leverages Continual Context Distillation (CCD), an iterative process that compresses and retains essential information through novel importance metrics, effectively maintaining critical data even without access to future context. Our comprehensive evaluations indicate that InfiniPot significantly outperforms models trained for long contexts in various NLP tasks, establishing its efficacy and versatility. This work represents a substantial advancement toward making LLMs applicable to a broader range of real-world scenarios.

InfiniPot: メモリ制約のあるLLM上での無限コンテキスト処理

InfiniPot: Infinite Context Processing on Memory-Constrained LLMs

要旨

Support