어텐션 베이슨: 대규모 언어 모델에서 문맥적 위치가 중요한 이유

초록

대규모 언어 모델(LLMs)의 성능은 입력 정보의 문맥적 위치에 크게 민감합니다. 이러한 위치 편향의 메커니즘을 조사하기 위해, 우리의 광범위한 실험은 '주의 분지(attention basin)'라고 명명한 일관된 현상을 밝혀냈습니다: 구조화된 항목(예: 검색된 문서 또는 소수 예제)의 시퀀스가 제시될 때, 모델은 시퀀스의 시작과 끝에 위치한 항목에 더 높은 주의를 할당하는 반면, 중간에 위치한 항목은 소홀히 하는 경향을 보입니다. 무엇보다도, 우리의 분석은 중요한 정보에 더 높은 주의를 할당하는 것이 모델 성능 향상의 핵심임을 추가로 밝혀냈습니다. 이러한 통찰을 바탕으로, 우리는 '주의 주도 재순위화(Attention-Driven Reranking, AttnRank)'라는 두 단계 프레임워크를 제안합니다. 이 프레임워크는 (i) 작은 보정 세트를 사용하여 모델의 내재적 위치 주의 선호도를 추정하고, (ii) 검색된 문서 또는 소수 예제를 재정렬하여 가장 중요한 내용이 이러한 높은 주의 위치와 일치하도록 합니다. AttnRank는 모델에 구애받지 않고, 추가 학습이 필요 없으며, 플러그 앤 플레이 방식으로 최소한의 계산 오버헤드만을 요구합니다. 다중 홉 질의응답(multi-hop QA) 및 소수 예제 문맥 학습(few-shot in-context learning) 작업에 대한 실험은 AttnRank가 다양한 아키텍처와 규모를 가진 10개의 대규모 언어 모델에서 모델 매개변수나 학습 절차를 수정하지 않고도 상당한 성능 향상을 달성함을 보여줍니다.

English

The performance of Large Language Models (LLMs) is significantly sensitive to the contextual position of information in the input. To investigate the mechanism behind this positional bias, our extensive experiments reveal a consistent phenomenon we term the attention basin: when presented with a sequence of structured items (e.g., retrieved documents or few-shot examples), models systematically assign higher attention to the items at the beginning and end of the sequence, while neglecting those in the middle. Crucially, our analysis further reveals that allocating higher attention to critical information is key to enhancing model performance. Based on these insights, we introduce Attention-Driven Reranking (AttnRank), a two-stage framework that (i) estimates a model's intrinsic positional attention preferences using a small calibration set, and (ii) reorders retrieved documents or few-shot examples to align the most salient content with these high-attention positions. AttnRank is a model-agnostic, training-free, and plug-and-play method with minimal computational overhead. Experiments on multi-hop QA and few-shot in-context learning tasks demonstrate that AttnRank achieves substantial improvements across 10 large language models of varying architectures and scales, without modifying model parameters or training procedures.

어텐션 베이슨: 대규모 언어 모델에서 문맥적 위치가 중요한 이유

Attention Basin: Why Contextual Position Matters in Large Language Models

초록

Support