充分利用上下文优化您的LLM

摘要

尽管许多当代大型语言模型（LLMs）能够处理较长的输入，但它们仍然难以充分利用长上下文中的信息，这被称为“中间迷失”挑战。我们假设这是由于长上下文训练过程中缺乏足够明确的监督所致，未能强调长上下文中的任何位置都可能包含关键信息。基于这种直觉，我们的研究提出了信息密集（IN2）训练，这是一个纯数据驱动的解决方案，用于克服“中间迷失”问题。具体而言，IN2训练利用了一个合成的长上下文问答数据集，其中答案需要（1）对合成长上下文（4K-32K tokens）中的一个短片段（~128 tokens）具有细粒度的信息意识，以及（2）整合和推理来自两个或更多短片段的信息。通过将这种信息密集训练应用于Mistral-7B，我们提出了FILM-7B（FILl-in-the-Middle）。为了全面评估FILM-7B利用长上下文的能力，我们设计了三个探测任务，涵盖各种上下文风格（文档、代码和结构化数据上下文）和信息检索模式（向前、向后和双向检索）。探测结果表明，FILM-7B能够稳健地从其32K上下文窗口中的不同位置检索信息。除了这些探测任务外，FILM-7B显著提高了在现实世界长上下文任务上的性能（例如，在NarrativeQA上从23.5提高到26.9的F1分数），同时在短上下文任务上保持了可比的表现（例如，在MMLU上从59.3的准确率保持为59.2）。Github链接：https://github.com/microsoft/FILM。

English

While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge. We hypothesize that it stems from insufficient explicit supervision during the long-context training, which fails to emphasize that any position in a long context can hold crucial information. Based on this intuition, our study presents information-intensive (IN2) training, a purely data-driven solution to overcome lost-in-the-middle. Specifically, IN2 training leverages a synthesized long-context question-answer dataset, where the answer requires (1) fine-grained information awareness on a short segment (~128 tokens) within a synthesized long context (4K-32K tokens), and (2) the integration and reasoning of information from two or more short segments. Through applying this information-intensive training on Mistral-7B, we present FILM-7B (FILl-in-the-Middle). To thoroughly assess the ability of FILM-7B for utilizing long contexts, we design three probing tasks that encompass various context styles (document, code, and structured-data context) and information retrieval patterns (forward, backward, and bi-directional retrieval). The probing results demonstrate that FILM-7B can robustly retrieve information from different positions in its 32K context window. Beyond these probing tasks, FILM-7B significantly improves the performance on real-world long-context tasks (e.g., 23.5->26.9 F1 score on NarrativeQA), while maintaining a comparable performance on short-context tasks (e.g., 59.3->59.2 accuracy on MMLU). Github Link: https://github.com/microsoft/FILM.

充分利用上下文优化您的LLM

Make Your LLM Fully Utilize the Context

摘要

Summary

Support

Support