ChatPaper.aiChatPaper

按次搜索模型属于弃权模型

Pay-Per-Search Models are Abstention Models

October 1, 2025
作者: Mustafa Omer Gul, Claire Cardie, Tanya Goyal
cs.AI

摘要

大型语言模型(LLMs)无法可靠地识别其参数化知识的边界,并常常对超出边界的问题产生幻觉式回答。相比之下,人类能够认识到自身的局限,并针对此类问题寻求外部帮助或选择回避。本文提出了一种名为MASH(通过选择性求助建模回避)的训练框架,该框架能够有效地从LLMs中提取回避行为。我们的核心观点是,若LLMs寻求外部帮助(如使用搜索工具)时,对外部帮助(搜索)进行适当惩罚,同时奖励答案的准确性,则这种求助行为可作为回避的代理。MASH通过强化学习实现这一理念,采用按次搜索的奖励机制。 我们在三个知识密集型的问答数据集上进行了实验。结果表明,MASH显著提升了先前高效搜索方法在选择性求助方面的表现;在多跳数据集上,MASH将答案准确率提高了7.6%。此外,MASH展现了强大的即插即用回避能力——它能够区分可回答与不可回答的问题,并选择性地对可回答的问题生成响应,其行为类似于专门的回避方法。我们强调,与先前的回避方法不同,MASH无需预先确定知识边界来构建训练数据。相反,MASH的回避行为是训练辅助性选择性求助任务的副产品。总体而言,我们展示了MASH训练有效地将搜索工具的使用与参数化知识对齐,这一特性可成功用于做出回避决策。
English
LLMs cannot reliably recognize their parametric knowledge boundaries and often hallucinate answers to outside-of-boundary questions. In contrast, humans recognize their limitations and can either seek external help for such questions or abstain. In this paper, we introduce MASH (Modeling Abstention via Selective Help-seeking), a training framework that readily extracts abstentions from LLMs. Our key idea is that any external help-seeking by an LLM, i.e. search tool use, can serve as a proxy for abstention if the external help (search) is appropriately penalized while simultaneously rewarding answer accuracy. MASH operationalizes this idea using reinforcement learning with a pay-per-search reward. We run experiments on three knowledge-intensive QA datasets. Our results show that MASH substantially improves upon the selective help-seeking performance of prior efficient search approaches; on multi-hop datasets, MASH improves answer accuracy by 7.6%. Furthermore, MASH demonstrates strong off-the-shelf abstention -- it can distinguish between unanswerable/answerable questions and selectively generate responses for answerable questions -- showcasing behavior analogous to specialized abstention approaches. We emphasize that contrary to prior abstention methods, MASH does not require pre-determining knowledge boundaries to construct training data. Instead, MASH's abstentions are a by-product of training for the auxiliary selective help-seeking task. Overall, we show that MASH training effectively aligns search tool use with parametric knowledge, which can be successfully leveraged for making abstention decisions.
PDF52October 2, 2025