查詢強盜用於幻覺緩解：利用語義特徵實現無悔重寫

摘要

大型語言模型（LLMs）的高級推理能力導致了幻覺現象的普遍增加；然而，大多數緩解工作集中在事後過濾而非塑造觸發這些幻覺的查詢。我們引入了QueryBandits，這是一個基於多臂賭博機框架的系統，旨在設計重寫策略以最大化獎勵模型，該模型基於輸入查詢的17種語言特徵的敏感性來封裝幻覺傾向，從而主動引導LLMs遠離生成幻覺。在13個多樣的問答基準測試和每個數據集1,050個詞彙擾動查詢中，我們的最佳上下文QueryBandit（湯普森採樣）相較於無重寫基線實現了87.5%的勝率，並且分別比零樣本靜態提示（“改寫”或“擴展”）高出42.6%和60.3%。因此，我們通過查詢重寫形式的干預，實證了QueryBandits在緩解幻覺方面的有效性。有趣的是，某些靜態提示策略，這些策略構成了當前查詢重寫文獻的相當一部分，其累積遺憾比無重寫基線更高，表明靜態重寫可能加劇幻覺。此外，我們發現，收斂的每臂迴歸特徵權重向量證實了沒有一種重寫策略對所有查詢都是最優的。在這種情況下，通過QueryBandits利用語義特徵進行引導重寫，可以通過前向傳遞機制顯著改變輸出行為，無需重新訓練或基於梯度的適應。

English

Advanced reasoning capabilities in Large Language Models (LLMs) have caused higher hallucination prevalence; yet most mitigation work focuses on after-the-fact filtering rather than shaping the queries that trigger them. We introduce QueryBandits, a bandit framework that designs rewrite strategies to maximize a reward model, that encapsulates hallucination propensity based upon the sensitivities of 17 linguistic features of the input query-and therefore, proactively steer LLMs away from generating hallucinations. Across 13 diverse QA benchmarks and 1,050 lexically perturbed queries per dataset, our top contextual QueryBandit (Thompson Sampling) achieves an 87.5% win rate over a no-rewrite baseline and also outperforms zero-shot static prompting ("paraphrase" or "expand") by 42.6% and 60.3% respectively. Therefore, we empirically substantiate the effectiveness of QueryBandits in mitigating hallucination via the intervention that takes the form of a query rewrite. Interestingly, certain static prompting strategies, which constitute a considerable number of current query rewriting literature, have a higher cumulative regret than the no-rewrite baseline, signifying that static rewrites can worsen hallucination. Moreover, we discover that the converged per-arm regression feature weight vectors substantiate that there is no single rewrite strategy optimal for all queries. In this context, guided rewriting via exploiting semantic features with QueryBandits can induce significant shifts in output behavior through forward-pass mechanisms, bypassing the need for retraining or gradient-based adaptation.

查詢強盜用於幻覺緩解：利用語義特徵實現無悔重寫

QueryBandits for Hallucination Mitigation: Exploiting Semantic Features for No-Regret Rewriting

摘要

Support