언어 모델의 위치 편향 제거: 메커니즘적 접근법

초록

위치 편향(position bias)은 현대 언어 모델(LMs)에서 널리 존재하는 문제로, 모델이 주어진 문맥 내에서 내용의 위치에 따라 우선순위를 부여하는 현상을 말합니다. 이러한 편향은 종종 예상치 못한 모델 실패를 초래하며 다양한 애플리케이션에서 성능, 견고성 및 신뢰성을 저해합니다. 우리의 기계적 분석은 이 위치 편향이 거의 모든 최첨단 언어 모델에서 사용되는 두 가지 구성 요소, 즉 인과적 주의(causal attention)와 상대적 위치 인코딩(relative positional encodings)에 기인한다고 밝혔습니다. 구체적으로, 우리는 인과적 주의가 일반적으로 모델이 먼 위치의 내용을 선호하도록 만들고, RoPE와 같은 상대적 위치 인코딩은 가까운 위치의 내용을 선호한다는 것을 검색 강화 질의응답(QA) 분석을 통해 발견했습니다. 또한, 객체 탐지에 대한 실험적 연구는 시각-언어 모델(VLMs)에서도 위치 편향이 존재함을 보여줍니다. 이러한 분석을 바탕으로, 우리는 다양한 입력 세그먼트 순서(예: LM-as-a-judge에서의 옵션, QA에서의 검색된 문서)로 인한 위치 편향을 **훈련 없이 제로샷 방식**으로 제거하는 방법을 제안합니다. 우리의 방법은 세그먼트 간의 인과적 주의를 양방향 주의(bidirectional attention)로 변경하고, 입력 프롬프트에서 제공된 순서 대신 모델의 주의 값을 사용하여 세그먼트의 상대적 순서를 결정함으로써 세그먼트 수준에서 **위치 불변 추론(Position-INvariant inferencE, PINE)**을 가능하게 합니다. 위치 편향을 제거함으로써, LM-as-a-judge 및 검색 강화 QA와 같이 위치 편향이 널리 존재하는 다운스트림 작업에서 모델의 성능과 신뢰성이 향상됩니다. 특히, PINE은 언어 모델을 추론 쌍 평가에 적응시킬 때 매우 유용합니다: 대부분의 경우에서 8~10% 포인트의 성능 향상을 일관되게 제공하며, Llama-3-70B-Instruct가 RewardBench 추론 하위 집합에서 GPT-4-0125-preview보다 더 나은 성능을 발휘하도록 만듭니다.

English

Position bias has proven to be a prevalent issue of modern language models (LMs), where the models prioritize content based on its position within the given context. This bias often leads to unexpected model failures and hurts performance, robustness, and reliability across various applications. Our mechanistic analysis attributes the position bias to two components employed in nearly all state-of-the-art LMs: causal attention and relative positional encodings. Specifically, we find that causal attention generally causes models to favor distant content, while relative positional encodings like RoPE prefer nearby ones based on the analysis of retrieval-augmented question answering (QA). Further, our empirical study on object detection reveals that position bias is also present in vision-language models (VLMs). Based on the above analyses, we propose to ELIMINATE position bias caused by different input segment orders (e.g., options in LM-as-a-judge, retrieved documents in QA) in a TRAINING-FREE ZERO-SHOT manner. Our method changes the causal attention to bidirectional attention between segments and utilizes model attention values to decide the relative orders of segments instead of using the order provided in input prompts, therefore enabling Position-INvariant inferencE (PINE) at the segment level. By eliminating position bias, models achieve better performance and reliability in downstream tasks where position bias widely exists, such as LM-as-a-judge and retrieval-augmented QA. Notably, PINE is especially useful when adapting LMs for evaluating reasoning pairs: it consistently provides 8 to 10 percentage points performance gains in most cases, and makes Llama-3-70B-Instruct perform even better than GPT-4-0125-preview on the RewardBench reasoning subset.

언어 모델의 위치 편향 제거: 메커니즘적 접근법

Eliminating Position Bias of Language Models: A Mechanistic Approach

초록

Support