位於中間:校準位置注意偏差以提升長距離上下文利用
Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization
June 23, 2024
作者: Cheng-Yu Hsieh, Yung-Sung Chuang, Chun-Liang Li, Zifeng Wang, Long T. Le, Abhishek Kumar, James Glass, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna, Tomas Pfister
cs.AI
摘要
大型語言模型(LLMs),即使專門訓練以處理長輸入內容,仍然難以捕捉位於其輸入中間的相關信息。這種現象被稱為“迷失在中間”問題。在這項工作中,我們做出三項貢獻。首先,我們致力於理解導致這種現象的因素。在這樣做的過程中,我們建立了“迷失在中間”與LLMs固有的注意力偏見之間的聯繫:LLMs表現出U形的注意力偏見,即其輸入開頭和結尾的標記獲得更高的注意力,無論其相關性如何。其次,我們通過一種校準機制“發現在中間”來減輕這種位置偏見,該機制允許模型根據其相關性忠實地關注上下文,即使它們位於中間位置。第三,我們展示“發現在中間”不僅在定位長上下文中的相關信息方面取得更好的性能,而且最終導致改進的檢索增強生成(RAG)性能跨越各種任務,超越現有方法高達15個百分點。這些發現開啟了理解LLM注意力偏見及其潛在後果的未來方向。
English
Large language models (LLMs), even when specifically trained to process long
input contexts, struggle to capture relevant information located in the middle
of their input. This phenomenon has been known as the lost-in-the-middle
problem. In this work, we make three contributions. First, we set out to
understand the factors that cause this phenomenon. In doing so, we establish a
connection between lost-in-the-middle to LLMs' intrinsic attention bias: LLMs
exhibit a U-shaped attention bias where the tokens at the beginning and at the
end of its input receive higher attention, regardless of their relevance.
Second, we mitigate this positional bias through a calibration mechanism,
found-in-the-middle, that allows the model to attend to contexts faithfully
according to their relevance, even though when they are in the middle. Third,
we show found-in-the-middle not only achieves better performance in locating
relevant information within a long context, but also eventually leads to
improved retrieval-augmented generation (RAG) performance across various tasks,
outperforming existing methods by up to 15 percentage points. These findings
open up future directions in understanding LLM attention bias and its potential
consequences.Summary
AI-Generated Summary