充分利用上下文优化您的LLM
Make Your LLM Fully Utilize the Context
April 25, 2024
作者: Shengnan An, Zexiong Ma, Zeqi Lin, Nanning Zheng, Jian-Guang Lou
cs.AI
摘要
尽管许多当代大型语言模型(LLMs)能够处理较长的输入,但它们仍然难以充分利用长上下文中的信息,这被称为“中间迷失”挑战。我们假设这是由于长上下文训练过程中缺乏足够明确的监督所致,未能强调长上下文中的任何位置都可能包含关键信息。基于这种直觉,我们的研究提出了信息密集(IN2)训练,这是一个纯数据驱动的解决方案,用于克服“中间迷失”问题。具体而言,IN2训练利用了一个合成的长上下文问答数据集,其中答案需要(1)对合成长上下文(4K-32K tokens)中的一个短片段(~128 tokens)具有细粒度的信息意识,以及(2)整合和推理来自两个或更多短片段的信息。通过将这种信息密集训练应用于Mistral-7B,我们提出了FILM-7B(FILl-in-the-Middle)。为了全面评估FILM-7B利用长上下文的能力,我们设计了三个探测任务,涵盖各种上下文风格(文档、代码和结构化数据上下文)和信息检索模式(向前、向后和双向检索)。探测结果表明,FILM-7B能够稳健地从其32K上下文窗口中的不同位置检索信息。除了这些探测任务外,FILM-7B显著提高了在现实世界长上下文任务上的性能(例如,在NarrativeQA上从23.5提高到26.9的F1分数),同时在短上下文任务上保持了可比的表现(例如,在MMLU上从59.3的准确率保持为59.2)。Github链接:https://github.com/microsoft/FILM。
English
While many contemporary large language models (LLMs) can process lengthy
input, they still struggle to fully utilize information within the long
context, known as the lost-in-the-middle challenge. We hypothesize that it
stems from insufficient explicit supervision during the long-context training,
which fails to emphasize that any position in a long context can hold crucial
information. Based on this intuition, our study presents information-intensive
(IN2) training, a purely data-driven solution to overcome lost-in-the-middle.
Specifically, IN2 training leverages a synthesized long-context question-answer
dataset, where the answer requires (1) fine-grained information awareness on a
short segment (~128 tokens) within a synthesized long context (4K-32K tokens),
and (2) the integration and reasoning of information from two or more short
segments. Through applying this information-intensive training on Mistral-7B,
we present FILM-7B (FILl-in-the-Middle). To thoroughly assess the ability of
FILM-7B for utilizing long contexts, we design three probing tasks that
encompass various context styles (document, code, and structured-data context)
and information retrieval patterns (forward, backward, and bi-directional
retrieval). The probing results demonstrate that FILM-7B can robustly retrieve
information from different positions in its 32K context window. Beyond these
probing tasks, FILM-7B significantly improves the performance on real-world
long-context tasks (e.g., 23.5->26.9 F1 score on NarrativeQA), while
maintaining a comparable performance on short-context tasks (e.g., 59.3->59.2
accuracy on MMLU). Github Link: https://github.com/microsoft/FILM.Summary
AI-Generated Summary