ChatPaper.aiChatPaper

中间发现:校准位置注意偏差可提高长上下文利用率

Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization

June 23, 2024
作者: Cheng-Yu Hsieh, Yung-Sung Chuang, Chun-Liang Li, Zifeng Wang, Long T. Le, Abhishek Kumar, James Glass, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna, Tomas Pfister
cs.AI

摘要

大型语言模型(LLMs),即使经过专门训练以处理长输入上下文,仍然难以捕捉位于其输入中间位置的相关信息。这种现象被称为“中间丢失”问题。在这项工作中,我们做出了三方面贡献。首先,我们致力于理解导致这一现象的因素。在这样做的过程中,我们建立了“中间丢失”与LLMs固有的注意偏差之间的联系:LLMs表现出U形的注意偏差,即其输入开头和结尾的标记会受到更高的关注,而不考虑它们的相关性。其次,我们通过一种校准机制“中间发现”来减轻这种位置偏差,使模型能够根据其相关性忠实地关注上下文,即使它们位于中间位置。第三,我们展示“中间发现”不仅在定位长上下文中的相关信息方面实现了更好的性能,而且最终导致在各种任务中改进了检索增强生成(RAG)的性能,超过现有方法高达15个百分点。这些发现为理解LLM注意偏差及其潜在后果打开了未来方向。
English
Large language models (LLMs), even when specifically trained to process long input contexts, struggle to capture relevant information located in the middle of their input. This phenomenon has been known as the lost-in-the-middle problem. In this work, we make three contributions. First, we set out to understand the factors that cause this phenomenon. In doing so, we establish a connection between lost-in-the-middle to LLMs' intrinsic attention bias: LLMs exhibit a U-shaped attention bias where the tokens at the beginning and at the end of its input receive higher attention, regardless of their relevance. Second, we mitigate this positional bias through a calibration mechanism, found-in-the-middle, that allows the model to attend to contexts faithfully according to their relevance, even though when they are in the middle. Third, we show found-in-the-middle not only achieves better performance in locating relevant information within a long context, but also eventually leads to improved retrieval-augmented generation (RAG) performance across various tasks, outperforming existing methods by up to 15 percentage points. These findings open up future directions in understanding LLM attention bias and its potential consequences.

Summary

AI-Generated Summary

PDF61November 29, 2024