边缘写作:长上下文检索的更好推理模式
Writing in the Margins: Better Inference Pattern for Long Context Retrieval
August 27, 2024
作者: Melisa Russak, Umar Jamil, Christopher Bryant, Kiran Kamble, Axel Magnuson, Mateusz Russak, Waseem AlShikh
cs.AI
摘要
本文介绍了边缘书写(WiM),这是一种为大型语言模型设计的新推理模式,旨在优化检索导向任务中长输入序列的处理。该方法利用分块预填充的键-值缓存来执行分段推理,从而实现对广泛上下文的高效处理,同时生成和分类中间信息(“边缘”),以引导模型朝向特定任务。这种方法在略微增加计算开销的同时,显著提升了现成模型的性能,无需进行微调。具体来说,我们观察到WiM对推理技能(HotpotQA,MultiHop-RAG)的准确性平均提升了7.5%,对聚合任务(CWE)的F1分数提升超过30.0%。此外,我们展示了所提出的模式如何融入交互式检索设计,为最终用户提供有关上下文处理进展的持续更新,并准确定位相关信息集成到最终响应中。我们在https://github.com/writer/writing-in-the-margins 上发布了WiM的实现,使用了Hugging Face Transformers库。
English
In this paper, we introduce Writing in the Margins (WiM), a new inference
pattern for Large Language Models designed to optimize the handling of long
input sequences in retrieval-oriented tasks. This approach leverages the
chunked prefill of the key-value cache to perform segment-wise inference, which
enables efficient processing of extensive contexts along with the generation
and classification of intermediate information ("margins") that guide the model
towards specific tasks. This method increases computational overhead marginally
while significantly enhancing the performance of off-the-shelf models without
the need for fine-tuning. Specifically, we observe that WiM provides an average
enhancement of 7.5% in accuracy for reasoning skills (HotpotQA, MultiHop-RAG)
and more than a 30.0% increase in the F1-score for aggregation tasks (CWE).
Additionally, we show how the proposed pattern fits into an interactive
retrieval design that provides end-users with ongoing updates about the
progress of context processing, and pinpoints the integration of relevant
information into the final response. We release our implementation of WiM using
Hugging Face Transformers library at
https://github.com/writer/writing-in-the-margins.Summary
AI-Generated Summary