充分利用上下文讓您的LLM發揮最大效益
Make Your LLM Fully Utilize the Context
April 25, 2024
作者: Shengnan An, Zexiong Ma, Zeqi Lin, Nanning Zheng, Jian-Guang Lou
cs.AI
摘要
儘管許多當代大型語言模型(LLMs)能夠處理冗長的輸入,但仍然難以充分利用長文本內的信息,這種情況被稱為中間遺失挑戰。我們假設這源於長文本訓練過程中缺乏足夠的明確監督,未能強調長文本中的任何位置都可能包含關鍵信息。基於這種直覺,我們的研究提出了信息密集(IN2)訓練,這是一種純粹基於數據的解決方案,用於克服中間遺失問題。具體來說,IN2訓練利用了一個合成的長文本問答數據集,其中答案需要(1)對合成長文本(4K-32K tokens)中的短段落(〜128 tokens)具有細粒度信息意識,以及(2)整合和推理來自兩個或更多短段落的信息。通過將這種信息密集訓練應用於Mistral-7B,我們提出了FILM-7B(填充中間)。為了全面評估FILM-7B利用長文本的能力,我們設計了三個探測任務,涵蓋各種文本風格(文檔、代碼和結構化數據文本)和信息檢索模式(向前、向後和雙向檢索)。探測結果表明,FILM-7B能夠穩健地從其32K文本窗口中的不同位置檢索信息。除了這些探測任務外,FILM-7B在真實世界的長文本任務上顯著提高了性能(例如,在NarrativeQA上的F1分數從23.5提高到26.9),同時在短文本任務上保持了可比的性能(例如,在MMLU上的準確率從59.3下降到59.2)。Github鏈接:https://github.com/microsoft/FILM。
English
While many contemporary large language models (LLMs) can process lengthy
input, they still struggle to fully utilize information within the long
context, known as the lost-in-the-middle challenge. We hypothesize that it
stems from insufficient explicit supervision during the long-context training,
which fails to emphasize that any position in a long context can hold crucial
information. Based on this intuition, our study presents information-intensive
(IN2) training, a purely data-driven solution to overcome lost-in-the-middle.
Specifically, IN2 training leverages a synthesized long-context question-answer
dataset, where the answer requires (1) fine-grained information awareness on a
short segment (~128 tokens) within a synthesized long context (4K-32K tokens),
and (2) the integration and reasoning of information from two or more short
segments. Through applying this information-intensive training on Mistral-7B,
we present FILM-7B (FILl-in-the-Middle). To thoroughly assess the ability of
FILM-7B for utilizing long contexts, we design three probing tasks that
encompass various context styles (document, code, and structured-data context)
and information retrieval patterns (forward, backward, and bi-directional
retrieval). The probing results demonstrate that FILM-7B can robustly retrieve
information from different positions in its 32K context window. Beyond these
probing tasks, FILM-7B significantly improves the performance on real-world
long-context tasks (e.g., 23.5->26.9 F1 score on NarrativeQA), while
maintaining a comparable performance on short-context tasks (e.g., 59.3->59.2
accuracy on MMLU). Github Link: https://github.com/microsoft/FILM.Summary
AI-Generated Summary