QueryBandits用于缓解幻觉：利用语义特征实现无遗憾重写

摘要

大型语言模型（LLMs）的高级推理能力导致了更高的幻觉发生率；然而，大多数缓解工作集中于事后过滤，而非塑造触发这些幻觉的查询。我们引入了QueryBandits，这是一个基于多臂赌博机框架的查询重写策略设计工具，旨在最大化一个奖励模型，该模型根据输入查询的17个语言特征敏感性来封装幻觉倾向，从而主动引导LLMs远离生成幻觉。在13个多样化的问答基准测试及每个数据集1,050个词汇扰动查询上，我们最优的上下文QueryBandit（Thompson采样）相较于无重写基线取得了87.5%的胜率，并分别以42.6%和60.3%的优势超越了零样本静态提示（“释义”或“扩展”）。因此，我们通过实证验证了QueryBandits通过查询重写干预形式在缓解幻觉方面的有效性。有趣的是，某些静态提示策略，构成了当前查询重写文献的相当一部分，其累积遗憾度高于无重写基线，表明静态重写可能加剧幻觉。此外，我们发现收敛的每臂回归特征权重向量证实了不存在适用于所有查询的单一最优重写策略。在此背景下，通过QueryBandits利用语义特征进行引导重写，能够通过前向传播机制引发输出行为的显著转变，无需重新训练或基于梯度的适应。

English

Advanced reasoning capabilities in Large Language Models (LLMs) have caused higher hallucination prevalence; yet most mitigation work focuses on after-the-fact filtering rather than shaping the queries that trigger them. We introduce QueryBandits, a bandit framework that designs rewrite strategies to maximize a reward model, that encapsulates hallucination propensity based upon the sensitivities of 17 linguistic features of the input query-and therefore, proactively steer LLMs away from generating hallucinations. Across 13 diverse QA benchmarks and 1,050 lexically perturbed queries per dataset, our top contextual QueryBandit (Thompson Sampling) achieves an 87.5% win rate over a no-rewrite baseline and also outperforms zero-shot static prompting ("paraphrase" or "expand") by 42.6% and 60.3% respectively. Therefore, we empirically substantiate the effectiveness of QueryBandits in mitigating hallucination via the intervention that takes the form of a query rewrite. Interestingly, certain static prompting strategies, which constitute a considerable number of current query rewriting literature, have a higher cumulative regret than the no-rewrite baseline, signifying that static rewrites can worsen hallucination. Moreover, we discover that the converged per-arm regression feature weight vectors substantiate that there is no single rewrite strategy optimal for all queries. In this context, guided rewriting via exploiting semantic features with QueryBandits can induce significant shifts in output behavior through forward-pass mechanisms, bypassing the need for retraining or gradient-based adaptation.

QueryBandits用于缓解幻觉：利用语义特征实现无遗憾重写

QueryBandits for Hallucination Mitigation: Exploiting Semantic Features for No-Regret Rewriting

摘要

Support