在多跳問答中的遮蔽機制：語言模型在上下文置換下的表現分析

摘要

多跳問答（Multi-hop Question Answering, MHQA）為問答系統增添了複雜性，使其更具挑戰性。當語言模型（Language Models, LMs）面對多個搜索結果時，它們不僅需要檢索相關信息，還需在多個信息源之間進行多跳推理。儘管語言模型在傳統問答任務中表現出色，但因果掩碼（causal mask）可能會阻礙其在複雜上下文中的推理能力。本文通過在不同配置下對搜索結果（檢索到的文檔）進行排列，探討了語言模型如何應對多跳問題。我們的研究揭示了以下有趣的發現：1）編碼器-解碼器模型，如Flan-T5系列，在MHQA任務中通常優於因果解碼器模型，儘管其規模顯著較小；2）改變黃金文檔的順序揭示了Flan T5模型和微調解碼器模型中的不同趨勢，當文檔順序與推理鏈順序一致時，性能最佳；3）通過修改因果掩碼來增強因果解碼器模型的雙向注意力，可以有效提升其最終表現。此外，我們還深入研究了MHQA背景下語言模型注意力權重的分佈。實驗表明，當答案正確時，注意力權重往往會達到較高值。我們利用這一發現，啟發式地提升了語言模型在此任務中的表現。我們的代碼已公開於https://github.com/hwy9855/MultiHopQA-Reasoning。

English

Multi-hop Question Answering (MHQA) adds layers of complexity to question answering, making it more challenging. When Language Models (LMs) are prompted with multiple search results, they are tasked not only with retrieving relevant information but also employing multi-hop reasoning across the information sources. Although LMs perform well on traditional question-answering tasks, the causal mask can hinder their capacity to reason across complex contexts. In this paper, we explore how LMs respond to multi-hop questions by permuting search results (retrieved documents) under various configurations. Our study reveals interesting findings as follows: 1) Encoder-decoder models, such as the ones in the Flan-T5 family, generally outperform causal decoder-only LMs in MHQA tasks, despite being significantly smaller in size; 2) altering the order of gold documents reveals distinct trends in both Flan T5 models and fine-tuned decoder-only models, with optimal performance observed when the document order aligns with the reasoning chain order; 3) enhancing causal decoder-only models with bi-directional attention by modifying the causal mask can effectively boost their end performance. In addition to the above, we conduct a thorough investigation of the distribution of LM attention weights in the context of MHQA. Our experiments reveal that attention weights tend to peak at higher values when the resulting answer is correct. We leverage this finding to heuristically improve LMs' performance on this task. Our code is publicly available at https://github.com/hwy9855/MultiHopQA-Reasoning.

在多跳問答中的遮蔽機制：語言模型在上下文置換下的表現分析

Masking in Multi-hop QA: An Analysis of How Language Models Perform with Context Permutation

摘要

Support