中間層に着目：位置情報の注意バイアスを調整することで長文脈の活用が向上する

要旨

大規模言語モデル（LLM）は、長い入力コンテキストを処理するために特別に訓練された場合でも、入力の中間に位置する関連情報を捉えるのに苦労します。この現象は「lost-in-the-middle（中間喪失）問題」として知られています。本研究では、3つの貢献を行います。第一に、この現象を引き起こす要因を理解することを目指します。その過程で、lost-in-the-middle問題とLLMの内在的な注意バイアスとの関連性を明らかにします。LLMは、入力の最初と最後のトークンに対して、その関連性に関わらず高い注意を向けるU字型の注意バイアスを示します。第二に、この位置バイアスを軽減するためのキャリブレーションメカニズム「found-in-the-middle（中間発見）」を提案します。これにより、モデルは関連性に従って忠実にコンテキストに注意を向けることが可能になり、それが中間に位置する場合でも同様です。第三に、found-in-the-middleが長いコンテキスト内で関連情報を見つける性能を向上させるだけでなく、様々なタスクにおける検索拡張生成（RAG）の性能も向上させ、既存の手法を最大15パーセントポイント上回ることを示します。これらの発見は、LLMの注意バイアスとその潜在的な影響を理解するための今後の研究方向を開拓するものです。

English

Large language models (LLMs), even when specifically trained to process long input contexts, struggle to capture relevant information located in the middle of their input. This phenomenon has been known as the lost-in-the-middle problem. In this work, we make three contributions. First, we set out to understand the factors that cause this phenomenon. In doing so, we establish a connection between lost-in-the-middle to LLMs' intrinsic attention bias: LLMs exhibit a U-shaped attention bias where the tokens at the beginning and at the end of its input receive higher attention, regardless of their relevance. Second, we mitigate this positional bias through a calibration mechanism, found-in-the-middle, that allows the model to attend to contexts faithfully according to their relevance, even though when they are in the middle. Third, we show found-in-the-middle not only achieves better performance in locating relevant information within a long context, but also eventually leads to improved retrieval-augmented generation (RAG) performance across various tasks, outperforming existing methods by up to 15 percentage points. These findings open up future directions in understanding LLM attention bias and its potential consequences.

中間層に着目：位置情報の注意バイアスを調整することで長文脈の活用が向上する

Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization

要旨

Support