在多跳问答中的掩码机制：语言模型在上下文置换下的表现分析

摘要

多跳问答（MHQA）为问答任务增添了复杂性，使其更具挑战性。当语言模型（LMs）面对多个搜索结果时，它们不仅需要检索相关信息，还需在信息源之间进行多跳推理。尽管LMs在传统问答任务中表现优异，但因果掩码可能会限制其在复杂上下文中的推理能力。本文通过在不同配置下对搜索结果（检索到的文档）进行排列，探讨了LMs如何应对多跳问题。我们的研究揭示了以下有趣发现：1）编码器-解码器模型，如Flan-T5系列，尽管规模显著较小，但在MHQA任务中通常优于仅因果解码器的LMs；2）改变关键文档的顺序，在Flan T5模型和微调后的仅解码器模型中均显示出不同的趋势，当文档顺序与推理链顺序一致时，性能达到最佳；3）通过修改因果掩码，为仅因果解码器模型引入双向注意力机制，能有效提升其最终表现。此外，我们还深入研究了LMs在MHQA上下文中的注意力权重分布。实验表明，当答案正确时，注意力权重往往在较高值处达到峰值。我们利用这一发现，启发式地提升了LMs在此任务上的表现。我们的代码已公开于https://github.com/hwy9855/MultiHopQA-Reasoning。

English

Multi-hop Question Answering (MHQA) adds layers of complexity to question answering, making it more challenging. When Language Models (LMs) are prompted with multiple search results, they are tasked not only with retrieving relevant information but also employing multi-hop reasoning across the information sources. Although LMs perform well on traditional question-answering tasks, the causal mask can hinder their capacity to reason across complex contexts. In this paper, we explore how LMs respond to multi-hop questions by permuting search results (retrieved documents) under various configurations. Our study reveals interesting findings as follows: 1) Encoder-decoder models, such as the ones in the Flan-T5 family, generally outperform causal decoder-only LMs in MHQA tasks, despite being significantly smaller in size; 2) altering the order of gold documents reveals distinct trends in both Flan T5 models and fine-tuned decoder-only models, with optimal performance observed when the document order aligns with the reasoning chain order; 3) enhancing causal decoder-only models with bi-directional attention by modifying the causal mask can effectively boost their end performance. In addition to the above, we conduct a thorough investigation of the distribution of LM attention weights in the context of MHQA. Our experiments reveal that attention weights tend to peak at higher values when the resulting answer is correct. We leverage this finding to heuristically improve LMs' performance on this task. Our code is publicly available at https://github.com/hwy9855/MultiHopQA-Reasoning.

在多跳问答中的掩码机制：语言模型在上下文置换下的表现分析

Masking in Multi-hop QA: An Analysis of How Language Models Perform with Context Permutation

摘要

Support