QueryBandits用于缓解幻觉:利用语义特征实现无遗憾重写
QueryBandits for Hallucination Mitigation: Exploiting Semantic Features for No-Regret Rewriting
August 22, 2025
作者: Nicole Cho, William Watson, Alec Koppel, Sumitra Ganesh, Manuela Veloso
cs.AI
摘要
大型语言模型(LLMs)的高级推理能力导致了更高的幻觉发生率;然而,大多数缓解工作集中于事后过滤,而非塑造触发这些幻觉的查询。我们引入了QueryBandits,这是一个基于多臂赌博机框架的查询重写策略设计工具,旨在最大化一个奖励模型,该模型根据输入查询的17个语言特征敏感性来封装幻觉倾向,从而主动引导LLMs远离生成幻觉。在13个多样化的问答基准测试及每个数据集1,050个词汇扰动查询上,我们最优的上下文QueryBandit(Thompson采样)相较于无重写基线取得了87.5%的胜率,并分别以42.6%和60.3%的优势超越了零样本静态提示(“释义”或“扩展”)。因此,我们通过实证验证了QueryBandits通过查询重写干预形式在缓解幻觉方面的有效性。有趣的是,某些静态提示策略,构成了当前查询重写文献的相当一部分,其累积遗憾度高于无重写基线,表明静态重写可能加剧幻觉。此外,我们发现收敛的每臂回归特征权重向量证实了不存在适用于所有查询的单一最优重写策略。在此背景下,通过QueryBandits利用语义特征进行引导重写,能够通过前向传播机制引发输出行为的显著转变,无需重新训练或基于梯度的适应。
English
Advanced reasoning capabilities in Large Language Models (LLMs) have caused
higher hallucination prevalence; yet most mitigation work focuses on
after-the-fact filtering rather than shaping the queries that trigger them. We
introduce QueryBandits, a bandit framework that designs rewrite strategies to
maximize a reward model, that encapsulates hallucination propensity based upon
the sensitivities of 17 linguistic features of the input query-and therefore,
proactively steer LLMs away from generating hallucinations. Across 13 diverse
QA benchmarks and 1,050 lexically perturbed queries per dataset, our top
contextual QueryBandit (Thompson Sampling) achieves an 87.5% win rate over a
no-rewrite baseline and also outperforms zero-shot static prompting
("paraphrase" or "expand") by 42.6% and 60.3% respectively. Therefore, we
empirically substantiate the effectiveness of QueryBandits in mitigating
hallucination via the intervention that takes the form of a query rewrite.
Interestingly, certain static prompting strategies, which constitute a
considerable number of current query rewriting literature, have a higher
cumulative regret than the no-rewrite baseline, signifying that static rewrites
can worsen hallucination. Moreover, we discover that the converged per-arm
regression feature weight vectors substantiate that there is no single rewrite
strategy optimal for all queries. In this context, guided rewriting via
exploiting semantic features with QueryBandits can induce significant shifts in
output behavior through forward-pass mechanisms, bypassing the need for
retraining or gradient-based adaptation.