환각 완화를 위한 QueryBandits: 무후회 재작성을 위한 의미론적 특징 활용

초록

대형 언어 모델(LLMs)의 고급 추론 능력은 환각(hallucination) 발생률을 증가시켰으나, 대부분의 완화 작업은 이를 유발하는 질의를 조정하기보다는 사후 필터링에 초점을 맞추고 있습니다. 본 연구에서는 QueryBandits를 소개합니다. 이는 입력 질의의 17가지 언어적 특성에 대한 민감도를 기반으로 환각 발생 경향성을 포괄하는 보상 모델을 최대화하기 위해 재작성 전략을 설계하는 밴딧 프레임워크로, LLM이 환각을 생성하지 않도록 사전에 조정합니다. 13개의 다양한 QA 벤치마크와 데이터셋당 1,050개의 어휘적 변형 질의를 대상으로 한 실험에서, 최상의 문맥적 QueryBandit(Thompson Sampling)은 재작성 없이 수행한 기준선 대비 87.5%의 승률을 달성했으며, 제로샷 정적 프롬프팅("paraphrase" 또는 "expand")을 각각 42.6%와 60.3% 앞섰습니다. 이를 통해 QueryBandits가 질의 재작성 형태의 개입을 통해 환각을 완화하는 데 효과적임을 실증적으로 입증했습니다. 흥미롭게도, 현재 질의 재작성 문헌에서 상당 부분을 차지하는 특정 정적 프롬프팅 전략은 재작성 없이 수행한 기준선보다 누적 후회도(cumulative regret)가 더 높아, 정적 재작성이 환각을 악화시킬 수 있음을 시사합니다. 또한, 수렴된 개별 전략의 회귀 특성 가중치 벡터를 통해 모든 질의에 대해 단일 최적 재작성 전략이 존재하지 않음을 확인했습니다. 이러한 맥락에서, QueryBandits를 통해 의미론적 특성을 활용한 가이드 재작성은 재학습이나 그래디언트 기반 적응 없이도 순전파 메커니즘을 통해 출력 행동에 상당한 변화를 유도할 수 있습니다.

English

Advanced reasoning capabilities in Large Language Models (LLMs) have caused higher hallucination prevalence; yet most mitigation work focuses on after-the-fact filtering rather than shaping the queries that trigger them. We introduce QueryBandits, a bandit framework that designs rewrite strategies to maximize a reward model, that encapsulates hallucination propensity based upon the sensitivities of 17 linguistic features of the input query-and therefore, proactively steer LLMs away from generating hallucinations. Across 13 diverse QA benchmarks and 1,050 lexically perturbed queries per dataset, our top contextual QueryBandit (Thompson Sampling) achieves an 87.5% win rate over a no-rewrite baseline and also outperforms zero-shot static prompting ("paraphrase" or "expand") by 42.6% and 60.3% respectively. Therefore, we empirically substantiate the effectiveness of QueryBandits in mitigating hallucination via the intervention that takes the form of a query rewrite. Interestingly, certain static prompting strategies, which constitute a considerable number of current query rewriting literature, have a higher cumulative regret than the no-rewrite baseline, signifying that static rewrites can worsen hallucination. Moreover, we discover that the converged per-arm regression feature weight vectors substantiate that there is no single rewrite strategy optimal for all queries. In this context, guided rewriting via exploiting semantic features with QueryBandits can induce significant shifts in output behavior through forward-pass mechanisms, bypassing the need for retraining or gradient-based adaptation.

환각 완화를 위한 QueryBandits: 무후회 재작성을 위한 의미론적 특징 활용

QueryBandits for Hallucination Mitigation: Exploiting Semantic Features for No-Regret Rewriting

초록

Support