幻覚緩和のためのQueryBandits: セマンティック特徴を活用したノーリグレット書き換え

要旨

大規模言語モデル（LLMs）における高度な推論能力は、幻覚（hallucination）の発生頻度を高める結果をもたらしている。しかし、これまでの緩和策の多くは、事後のフィルタリングに焦点を当てており、幻覚を引き起こすクエリ自体を形成するアプローチにはあまり注目されていない。本論文では、QueryBanditsを紹介する。これは、入力クエリの17の言語的特徴の感度に基づいて幻覚の発生傾向を捉えた報酬モデルを最大化するために、書き換え戦略を設計するバンディットフレームワークであり、LLMsが幻覚を生成することを事前に防ぐことを目的としている。13の多様なQAベンチマークと各データセットあたり1,050の語彙的摂動クエリを用いた実験において、最適なコンテキスト依存型QueryBandit（Thompson Sampling）は、書き換えを行わないベースラインに対して87.5%の勝率を達成し、ゼロショット静的プロンプティング（「言い換え」や「拡張」）に対してもそれぞれ42.6%と60.3%の優位性を示した。これにより、クエリの書き換えという介入を通じて幻覚を緩和するQueryBanditsの有効性が実証された。興味深いことに、現在のクエリ書き換え研究の多くを占める特定の静的プロンプティング戦略は、書き換えを行わないベースラインよりも累積的な後悔が大きく、静的書き換えが幻覚を悪化させる可能性があることを示唆している。さらに、収束した各アームの回帰特徴重みベクトルを分析した結果、すべてのクエリに対して最適な単一の書き換え戦略は存在しないことが明らかになった。この文脈において、QueryBanditsを用いて意味的特徴を活用したガイド付き書き換えは、再学習や勾配ベースの適応を必要とせず、フォワードパスメカニズムを通じて出力行動に大きな変化をもたらすことができる。

English

Advanced reasoning capabilities in Large Language Models (LLMs) have caused higher hallucination prevalence; yet most mitigation work focuses on after-the-fact filtering rather than shaping the queries that trigger them. We introduce QueryBandits, a bandit framework that designs rewrite strategies to maximize a reward model, that encapsulates hallucination propensity based upon the sensitivities of 17 linguistic features of the input query-and therefore, proactively steer LLMs away from generating hallucinations. Across 13 diverse QA benchmarks and 1,050 lexically perturbed queries per dataset, our top contextual QueryBandit (Thompson Sampling) achieves an 87.5% win rate over a no-rewrite baseline and also outperforms zero-shot static prompting ("paraphrase" or "expand") by 42.6% and 60.3% respectively. Therefore, we empirically substantiate the effectiveness of QueryBandits in mitigating hallucination via the intervention that takes the form of a query rewrite. Interestingly, certain static prompting strategies, which constitute a considerable number of current query rewriting literature, have a higher cumulative regret than the no-rewrite baseline, signifying that static rewrites can worsen hallucination. Moreover, we discover that the converged per-arm regression feature weight vectors substantiate that there is no single rewrite strategy optimal for all queries. In this context, guided rewriting via exploiting semantic features with QueryBandits can induce significant shifts in output behavior through forward-pass mechanisms, bypassing the need for retraining or gradient-based adaptation.

幻覚緩和のためのQueryBandits: セマンティック特徴を活用したノーリグレット書き換え

QueryBandits for Hallucination Mitigation: Exploiting Semantic Features for No-Regret Rewriting

要旨

Support