按次搜索模型属于弃权模型

摘要

大型语言模型（LLMs）无法可靠地识别其参数化知识的边界，并常常对超出边界的问题产生幻觉式回答。相比之下，人类能够认识到自身的局限，并针对此类问题寻求外部帮助或选择回避。本文提出了一种名为MASH（通过选择性求助建模回避）的训练框架，该框架能够有效地从LLMs中提取回避行为。我们的核心观点是，若LLMs寻求外部帮助（如使用搜索工具）时，对外部帮助（搜索）进行适当惩罚，同时奖励答案的准确性，则这种求助行为可作为回避的代理。MASH通过强化学习实现这一理念，采用按次搜索的奖励机制。我们在三个知识密集型的问答数据集上进行了实验。结果表明，MASH显著提升了先前高效搜索方法在选择性求助方面的表现；在多跳数据集上，MASH将答案准确率提高了7.6%。此外，MASH展现了强大的即插即用回避能力——它能够区分可回答与不可回答的问题，并选择性地对可回答的问题生成响应，其行为类似于专门的回避方法。我们强调，与先前的回避方法不同，MASH无需预先确定知识边界来构建训练数据。相反，MASH的回避行为是训练辅助性选择性求助任务的副产品。总体而言，我们展示了MASH训练有效地将搜索工具的使用与参数化知识对齐，这一特性可成功用于做出回避决策。

English

LLMs cannot reliably recognize their parametric knowledge boundaries and often hallucinate answers to outside-of-boundary questions. In contrast, humans recognize their limitations and can either seek external help for such questions or abstain. In this paper, we introduce MASH (Modeling Abstention via Selective Help-seeking), a training framework that readily extracts abstentions from LLMs. Our key idea is that any external help-seeking by an LLM, i.e. search tool use, can serve as a proxy for abstention if the external help (search) is appropriately penalized while simultaneously rewarding answer accuracy. MASH operationalizes this idea using reinforcement learning with a pay-per-search reward. We run experiments on three knowledge-intensive QA datasets. Our results show that MASH substantially improves upon the selective help-seeking performance of prior efficient search approaches; on multi-hop datasets, MASH improves answer accuracy by 7.6%. Furthermore, MASH demonstrates strong off-the-shelf abstention -- it can distinguish between unanswerable/answerable questions and selectively generate responses for answerable questions -- showcasing behavior analogous to specialized abstention approaches. We emphasize that contrary to prior abstention methods, MASH does not require pre-determining knowledge boundaries to construct training data. Instead, MASH's abstentions are a by-product of training for the auxiliary selective help-seeking task. Overall, we show that MASH training effectively aligns search tool use with parametric knowledge, which can be successfully leveraged for making abstention decisions.

按次搜索模型属于弃权模型

Pay-Per-Search Models are Abstention Models

摘要

Support