Pay-Per-Search 모델은 기권(Abstention) 모델입니다.

초록

LLM은 자신의 파라미터적 지식 경계를 신뢰성 있게 인식하지 못하며, 종종 경계를 벗어난 질문에 대해 환각적인 답변을 생성합니다. 반면, 인간은 자신의 한계를 인식하고 그러한 질문에 대해 외부 도움을 구하거나 답변을 자제할 수 있습니다. 본 논문에서는 MASH(Modeling Abstention via Selective Help-seeking)라는 훈련 프레임워크를 소개합니다. 이 프레임워크는 LLM으로부터 자제를 쉽게 추출할 수 있도록 설계되었습니다. 우리의 핵심 아이디어는 LLM의 외부 도움 요청(예: 검색 도구 사용)이 외부 도움(검색)에 적절한 패널티를 부여하면서 동시에 답변 정확도를 보상하는 경우, 자제의 대리 지표로 사용될 수 있다는 것입니다. MASH는 이 아이디어를 검색당 지불 보상을 사용한 강화 학습으로 구현합니다. 우리는 세 가지 지식 집약적 QA 데이터셋에서 실험을 진행했습니다. 실험 결과, MASH는 기존의 효율적 검색 접근법의 선택적 도움 요청 성능을 크게 개선했으며, 멀티홉 데이터셋에서 답변 정확도를 7.6% 향상시켰습니다. 또한, MASH는 강력한 즉시 사용 가능한 자제 능력을 보여주었습니다. 이는 답변 불가능한 질문과 답변 가능한 질문을 구분하고, 답변 가능한 질문에 대해 선택적으로 응답을 생성할 수 있어, 특화된 자제 접근법과 유사한 행동을 보여줍니다. 우리는 기존의 자제 방법과 달리 MASH가 훈련 데이터를 구성하기 위해 사전에 지식 경계를 결정할 필요가 없음을 강조합니다. 대신, MASH의 자제는 보조적인 선택적 도움 요청 작업을 위한 훈련의 부산물로 발생합니다. 전반적으로, MASH 훈련은 검색 도구 사용을 파라미터적 지식과 효과적으로 정렬하며, 이는 자제 결정을 내리는 데 성공적으로 활용될 수 있음을 보여줍니다.

English

LLMs cannot reliably recognize their parametric knowledge boundaries and often hallucinate answers to outside-of-boundary questions. In contrast, humans recognize their limitations and can either seek external help for such questions or abstain. In this paper, we introduce MASH (Modeling Abstention via Selective Help-seeking), a training framework that readily extracts abstentions from LLMs. Our key idea is that any external help-seeking by an LLM, i.e. search tool use, can serve as a proxy for abstention if the external help (search) is appropriately penalized while simultaneously rewarding answer accuracy. MASH operationalizes this idea using reinforcement learning with a pay-per-search reward. We run experiments on three knowledge-intensive QA datasets. Our results show that MASH substantially improves upon the selective help-seeking performance of prior efficient search approaches; on multi-hop datasets, MASH improves answer accuracy by 7.6%. Furthermore, MASH demonstrates strong off-the-shelf abstention -- it can distinguish between unanswerable/answerable questions and selectively generate responses for answerable questions -- showcasing behavior analogous to specialized abstention approaches. We emphasize that contrary to prior abstention methods, MASH does not require pre-determining knowledge boundaries to construct training data. Instead, MASH's abstentions are a by-product of training for the auxiliary selective help-seeking task. Overall, we show that MASH training effectively aligns search tool use with parametric knowledge, which can be successfully leveraged for making abstention decisions.

Pay-Per-Search 모델은 기권(Abstention) 모델입니다.

Pay-Per-Search Models are Abstention Models

초록

Support