ペイ・パー・サーチモデルは棄権モデルである

要旨

LLM（大規模言語モデル）は、自身のパラメトリック知識の境界を確実に認識することができず、境界外の質問に対してしばしば虚構の回答を生成する。一方、人間は自身の限界を認識し、そのような質問に対して外部の助けを求めるか、あるいは回答を控えることができる。本論文では、MASH（Modeling Abstention via Selective Help-seeking）を紹介する。これは、LLMから回答控えを容易に抽出するための訓練フレームワークである。我々の主要なアイデアは、LLMによる外部の助けの要請（例えば検索ツールの使用）が、外部の助け（検索）を適切にペナルティ化しつつ、同時に回答の正確性を報酬として与えることで、回答控えの代理として機能し得るというものである。MASHは、このアイデアを「検索ごとの報酬」を用いた強化学習によって実現する。我々は、3つの知識集約的な質問応答データセットを用いて実験を行った。その結果、MASHは従来の効率的な検索アプローチにおける選択的助け要請の性能を大幅に向上させることが示された。特に、マルチホップデータセットにおいて、MASHは回答の正確性を7.6%向上させた。さらに、MASHは強力なオフ・ザ・シェルフの回答控えを示し、回答不能な質問と回答可能な質問を区別し、回答可能な質問に対して選択的に回答を生成するという、専門的な回答控えアプローチに類似した振る舞いを示した。我々は、従来の回答控え手法とは異なり、MASHが訓練データを構築するために知識境界を事前に決定する必要がないことを強調する。代わりに、MASHの回答控えは、補助的な選択的助け要請タスクの訓練の副産物として生じる。全体として、MASHの訓練は検索ツールの使用をパラメトリック知識と効果的に整合させ、それを回答控えの決定に成功裏に活用できることを示した。

English

LLMs cannot reliably recognize their parametric knowledge boundaries and often hallucinate answers to outside-of-boundary questions. In contrast, humans recognize their limitations and can either seek external help for such questions or abstain. In this paper, we introduce MASH (Modeling Abstention via Selective Help-seeking), a training framework that readily extracts abstentions from LLMs. Our key idea is that any external help-seeking by an LLM, i.e. search tool use, can serve as a proxy for abstention if the external help (search) is appropriately penalized while simultaneously rewarding answer accuracy. MASH operationalizes this idea using reinforcement learning with a pay-per-search reward. We run experiments on three knowledge-intensive QA datasets. Our results show that MASH substantially improves upon the selective help-seeking performance of prior efficient search approaches; on multi-hop datasets, MASH improves answer accuracy by 7.6%. Furthermore, MASH demonstrates strong off-the-shelf abstention -- it can distinguish between unanswerable/answerable questions and selectively generate responses for answerable questions -- showcasing behavior analogous to specialized abstention approaches. We emphasize that contrary to prior abstention methods, MASH does not require pre-determining knowledge boundaries to construct training data. Instead, MASH's abstentions are a by-product of training for the auxiliary selective help-seeking task. Overall, we show that MASH training effectively aligns search tool use with parametric knowledge, which can be successfully leveraged for making abstention decisions.

ペイ・パー・サーチモデルは棄権モデルである

Pay-Per-Search Models are Abstention Models

要旨

Support