充分利用上下文讓您的LLM發揮最大效益

摘要

儘管許多當代大型語言模型（LLMs）能夠處理冗長的輸入，但仍然難以充分利用長文本內的信息，這種情況被稱為中間遺失挑戰。我們假設這源於長文本訓練過程中缺乏足夠的明確監督，未能強調長文本中的任何位置都可能包含關鍵信息。基於這種直覺，我們的研究提出了信息密集（IN2）訓練，這是一種純粹基於數據的解決方案，用於克服中間遺失問題。具體來說，IN2訓練利用了一個合成的長文本問答數據集，其中答案需要（1）對合成長文本（4K-32K tokens）中的短段落（〜128 tokens）具有細粒度信息意識，以及（2）整合和推理來自兩個或更多短段落的信息。通過將這種信息密集訓練應用於Mistral-7B，我們提出了FILM-7B（填充中間）。為了全面評估FILM-7B利用長文本的能力，我們設計了三個探測任務，涵蓋各種文本風格（文檔、代碼和結構化數據文本）和信息檢索模式（向前、向後和雙向檢索）。探測結果表明，FILM-7B能夠穩健地從其32K文本窗口中的不同位置檢索信息。除了這些探測任務外，FILM-7B在真實世界的長文本任務上顯著提高了性能（例如，在NarrativeQA上的F1分數從23.5提高到26.9），同時在短文本任務上保持了可比的性能（例如，在MMLU上的準確率從59.3下降到59.2）。Github鏈接：https://github.com/microsoft/FILM。

English

While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge. We hypothesize that it stems from insufficient explicit supervision during the long-context training, which fails to emphasize that any position in a long context can hold crucial information. Based on this intuition, our study presents information-intensive (IN2) training, a purely data-driven solution to overcome lost-in-the-middle. Specifically, IN2 training leverages a synthesized long-context question-answer dataset, where the answer requires (1) fine-grained information awareness on a short segment (~128 tokens) within a synthesized long context (4K-32K tokens), and (2) the integration and reasoning of information from two or more short segments. Through applying this information-intensive training on Mistral-7B, we present FILM-7B (FILl-in-the-Middle). To thoroughly assess the ability of FILM-7B for utilizing long contexts, we design three probing tasks that encompass various context styles (document, code, and structured-data context) and information retrieval patterns (forward, backward, and bi-directional retrieval). The probing results demonstrate that FILM-7B can robustly retrieve information from different positions in its 32K context window. Beyond these probing tasks, FILM-7B significantly improves the performance on real-world long-context tasks (e.g., 23.5->26.9 F1 score on NarrativeQA), while maintaining a comparable performance on short-context tasks (e.g., 59.3->59.2 accuracy on MMLU). Github Link: https://github.com/microsoft/FILM.

充分利用上下文讓您的LLM發揮最大效益

Make Your LLM Fully Utilize the Context

摘要

Summary

Support

Support