あなたのLLMにコンテキストを最大限活用させる

要旨

多くの現代的な大規模言語モデル（LLM）は長い入力を処理できるが、依然として長いコンテキスト内の情報を十分に活用するのに苦労しており、これは「lost-in-the-middle（中間での喪失）」問題として知られている。我々は、この問題が長いコンテキストのトレーニング中に十分な明示的な監督が行われず、長いコンテキスト内のどの位置にも重要な情報が含まれ得ることを強調できていないことに起因すると仮説を立てた。この直感に基づき、本研究では、lost-in-the-middleを克服するための純粋にデータ駆動型の解決策である「情報集約型（IN2）トレーニング」を提案する。具体的には、IN2トレーニングは、合成された長いコンテキスト（4K-32Kトークン）内の短いセグメント（約128トークン）に対する細かい情報認識と、2つ以上の短いセグメントからの情報の統合と推論を必要とする合成された長いコンテキストの質問応答データセットを活用する。この情報集約型トレーニングをMistral-7Bに適用し、FILM-7B（FILl-in-the-Middle）を提示する。FILM-7Bの長いコンテキストを活用する能力を徹底的に評価するため、様々なコンテキストスタイル（ドキュメント、コード、構造化データコンテキスト）と情報検索パターン（前方、後方、双方向検索）を網羅する3つのプロービングタスクを設計した。プロービング結果は、FILM-7Bが32Kのコンテキストウィンドウ内の異なる位置から情報を堅牢に検索できることを示している。これらのプロービングタスクを超えて、FILM-7Bは実世界の長いコンテキストタスク（例：NarrativeQAでのF1スコア23.5→26.9）のパフォーマンスを大幅に向上させながら、短いコンテキストタスク（例：MMLUでの精度59.3→59.2）でも同等のパフォーマンスを維持している。Githubリンク: https://github.com/microsoft/FILM。

English

While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge. We hypothesize that it stems from insufficient explicit supervision during the long-context training, which fails to emphasize that any position in a long context can hold crucial information. Based on this intuition, our study presents information-intensive (IN2) training, a purely data-driven solution to overcome lost-in-the-middle. Specifically, IN2 training leverages a synthesized long-context question-answer dataset, where the answer requires (1) fine-grained information awareness on a short segment (~128 tokens) within a synthesized long context (4K-32K tokens), and (2) the integration and reasoning of information from two or more short segments. Through applying this information-intensive training on Mistral-7B, we present FILM-7B (FILl-in-the-Middle). To thoroughly assess the ability of FILM-7B for utilizing long contexts, we design three probing tasks that encompass various context styles (document, code, and structured-data context) and information retrieval patterns (forward, backward, and bi-directional retrieval). The probing results demonstrate that FILM-7B can robustly retrieve information from different positions in its 32K context window. Beyond these probing tasks, FILM-7B significantly improves the performance on real-world long-context tasks (e.g., 23.5->26.9 F1 score on NarrativeQA), while maintaining a comparable performance on short-context tasks (e.g., 59.3->59.2 accuracy on MMLU). Github Link: https://github.com/microsoft/FILM.

あなたのLLMにコンテキストを最大限活用させる

Make Your LLM Fully Utilize the Context

要旨

Support