没有万能良方:运用QueryBandits技术缓解幻觉现象
No One Size Fits All: QueryBandits for Hallucination Mitigation
February 23, 2026
作者: Nicole Cho, William Watson, Alec Koppel, Sumitra Ganesh, Manuela Veloso
cs.AI
摘要
大型语言模型(LLM)的先进推理能力导致幻觉现象愈发频繁,然而现有缓解研究多集中于开源模型的事后检测与参数编辑。由于闭源模型在机构部署中占据绝对主导地位,针对其幻觉问题的研究匮乏尤为令人担忧。我们提出QueryBandits——一个模型无关的上下文赌博框架,该框架通过经验验证的校准奖励函数,自适应地在线学习选择最优查询重写策略。在16个问答场景测试中,表现最佳的QueryBandit(汤普森采样)相较"无重写"基线获得87.5%的胜率,并分别以42.6%和60.3%的优势超越零样本静态策略(如复述或扩展)。此外,所有上下文赌博算法在所有数据集上均优于原始赌博算法,且特征方差越大臂选择方差也越大,这印证了"不存在适用于所有查询的最优重写策略"的发现。我们还发现某些静态策略比"无重写"产生更高的累积遗憾,表明僵化的查询重写策略反而会加剧幻觉。因此,通过QueryBandits基于语义特征学习在线策略,可仅凭前向传播机制改变模型行为,使其能应用于闭源模型,并规避重新训练或基于梯度调整的需求。
English
Advanced reasoning capabilities in Large Language Models (LLMs) have led to more frequent hallucinations; yet most mitigation work focuses on open-source models for post-hoc detection and parameter editing. The dearth of studies focusing on hallucinations in closed-source models is especially concerning, as they constitute the vast majority of models in institutional deployments. We introduce QueryBandits, a model-agnostic contextual bandit framework that adaptively learns online to select the optimal query-rewrite strategy by leveraging an empirically validated and calibrated reward function. Across 16 QA scenarios, our top QueryBandit (Thompson Sampling) achieves an 87.5% win rate over a No-Rewrite baseline and outperforms zero-shot static policies (e.g., Paraphrase or Expand) by 42.6% and 60.3%, respectively. Moreover, all contextual bandits outperform vanilla bandits across all datasets, with higher feature variance coinciding with greater variance in arm selection. This substantiates our finding that there is no single rewrite policy optimal for all queries. We also discover that certain static policies incur higher cumulative regret than No-Rewrite, indicating that an inflexible query-rewriting policy can worsen hallucinations. Thus, learning an online policy over semantic features with QueryBandits can shift model behavior purely through forward-pass mechanisms, enabling its use with closed-source models and bypassing the need for retraining or gradient-based adaptation.