ChatPaper.aiChatPaper

按次搜索模型即弃权模型

Pay-Per-Search Models are Abstention Models

October 1, 2025
作者: Mustafa Omer Gul, Claire Cardie, Tanya Goyal
cs.AI

摘要

大型语言模型(LLMs)无法可靠识别其参数化知识的边界,常常对超出边界的问题产生幻觉式回答。相比之下,人类能够认识到自身的局限,对于此类问题,要么寻求外部帮助,要么选择不回答。本文提出了MASH(通过选择性求助建模弃权)训练框架,该框架能够轻松地从LLMs中提取弃权行为。我们的核心思想是,如果LLM寻求外部帮助(如使用搜索工具)时,对外部帮助(搜索)进行适当惩罚,同时奖励回答的准确性,那么这种求助行为即可作为弃权的代理。MASH通过强化学习实现这一理念,采用按次搜索计酬的奖励机制。 我们在三个知识密集型问答数据集上进行了实验。结果表明,MASH显著提升了先前高效搜索方法在选择性求助方面的性能;在多跳数据集上,MASH将回答准确率提高了7.6%。此外,MASH展现了强大的即插即用弃权能力——它能够区分可回答与不可回答的问题,并选择性地对可回答的问题生成响应——这一行为与专门的弃权方法相类似。我们强调,与先前的弃权方法不同,MASH无需预先确定知识边界来构建训练数据。相反,MASH的弃权行为是训练辅助性选择性求助任务的副产品。总体而言,我们展示了MASH训练有效地将搜索工具的使用与参数化知识对齐,这一特性可成功用于做出弃权决策。
English
LLMs cannot reliably recognize their parametric knowledge boundaries and often hallucinate answers to outside-of-boundary questions. In contrast, humans recognize their limitations and can either seek external help for such questions or abstain. In this paper, we introduce MASH (Modeling Abstention via Selective Help-seeking), a training framework that readily extracts abstentions from LLMs. Our key idea is that any external help-seeking by an LLM, i.e. search tool use, can serve as a proxy for abstention if the external help (search) is appropriately penalized while simultaneously rewarding answer accuracy. MASH operationalizes this idea using reinforcement learning with a pay-per-search reward. We run experiments on three knowledge-intensive QA datasets. Our results show that MASH substantially improves upon the selective help-seeking performance of prior efficient search approaches; on multi-hop datasets, MASH improves answer accuracy by 7.6%. Furthermore, MASH demonstrates strong off-the-shelf abstention -- it can distinguish between unanswerable/answerable questions and selectively generate responses for answerable questions -- showcasing behavior analogous to specialized abstention approaches. We emphasize that contrary to prior abstention methods, MASH does not require pre-determining knowledge boundaries to construct training data. Instead, MASH's abstentions are a by-product of training for the auxiliary selective help-seeking task. Overall, we show that MASH training effectively aligns search tool use with parametric knowledge, which can be successfully leveraged for making abstention decisions.
PDF52October 2, 2025