4Kから400Kへ飛躍：アクティベーションビーコンによるLLMのコンテキスト拡張

要旨

長文脈の活用は、大規模言語モデルにとって大きな課題となっている。これは、モデルのコンテキストウィンドウ長が限られているためである。コンテキストウィンドウはファインチューニングによって拡張できるが、トレーニングと推論の両方でかなりのコストがかかり、LLMの本来の能力に悪影響を及ぼす可能性がある。本研究では、Activation Beaconを提案する。これは、LLMの生のアクティベーションをよりコンパクトな形式に凝縮することで、限られたコンテキストウィンドウ内でより長い文脈を認識できるようにするものである。Activation Beaconは、LLMのプラグアンドプレイモジュールとして導入される。短い文脈に対するLLMの本来の能力を完全に維持しつつ、長い文脈を処理する新たな能力を拡張する。さらに、長い文脈を処理するために短いスライディングウィンドウを使用し、トレーニングと推論の両方で競争力のあるメモリ効率と時間効率を実現する。Activation Beaconは、多様な凝縮比率を持つビーコンの混合を条件とした自己回帰タスクによって学習される。この手法により、短いシーケンスデータのみを使用してわずか10Kステップで効率的にトレーニングでき、8xA800 GPUマシン1台で9時間未満の消費時間で済む。実験的研究では、Activation BeaconがLlama-2-7Bのコンテキスト長を100倍（4Kから400K）に拡張し、長文脈生成と理解タスクの両方で優れた結果を達成できることが示されている。我々のモデルとコードはBGEリポジトリで公開予定である。

English

The utilization of long contexts poses a big challenge for large language models due to their limited context window length. Although the context window can be extended through fine-tuning, it will result in a considerable cost at both training and inference time, and exert an unfavorable impact to the LLM's original capabilities. In this work, we propose Activation Beacon, which condenses LLM's raw activations into more compact forms such that it can perceive a much longer context with a limited context window. Activation Beacon is introduced as a plug-and-play module for the LLM. It fully preserves the LLM's original capability on short contexts while extending the new capability on processing longer contexts. Besides, it works with short sliding windows to process the long context, which achieves a competitive memory and time efficiency in both training and inference. Activation Beacon is learned by the auto-regression task conditioned on a mixture of beacons with diversified condensing ratios. Thanks to such a treatment, it can be efficiently trained purely with short-sequence data in just 10K steps, which consumes less than 9 hours on a single 8xA800 GPU machine. The experimental studies show that Activation Beacon is able to extend Llama-2-7B's context length by times100 times (from 4K to 400K), meanwhile achieving a superior result on both long-context generation and understanding tasks. Our model and code will be available at the BGE repository.

4Kから400Kへ飛躍：アクティベーションビーコンによるLLMのコンテキスト拡張

Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon

要旨

Support