引導式解碼及其在檢索增強生成中的關鍵作用
Guided Decoding and Its Critical Role in Retrieval-Augmented Generation
September 8, 2025
作者: Özgür Uğur, Musa Yılmaz, Esra Şavirdi, Özay Ezerceli, Mahmut El Huseyni, Selva Taş, Reyhan Bayraktar
cs.AI
摘要
大型语言模型(LLMs)在各类应用中的整合,催生了对结构化且可靠响应的需求。检索增强生成(RAG)系统面临的一个核心挑战,在于确保输出符合预期格式的同时,最大限度地减少幻觉现象。本研究探讨了引导解码在RAG系统中的作用,通过对比三种方法——大纲法、XGrammar法及LM格式强制法——在不同多轮提示设置(零轮、一轮及两轮)下的表现,评估了成功率、幻觉率及输出质量,从而深入剖析了它们的性能与适用性。我们的研究揭示了多轮交互如何影响引导解码,发现了意料之外的性能差异,这些发现为特定应用场景下的方法选择提供了依据。本工作深化了对RAG系统中结构化输出生成的理解,为LLM的部署提供了理论洞见与实践指导。
English
The integration of Large Language Models (LLMs) into various applications has
driven the need for structured and reliable responses. A key challenge in
Retrieval-Augmented Generation (RAG) systems is ensuring that outputs align
with expected formats while minimizing hallucinations. This study examines the
role of guided decoding in RAG systems, comparing three methods, Outlines,
XGrammar, and LM Format Enforcer, across different multi-turn prompting setups
(0-turn, 1-turn, and 2-turn). By evaluating success rates, hallucination rates,
and output quality, we provide insights into their performance and
applicability. Our findings reveal how multi-turn interactions influence guided
decoding, uncovering unexpected performance variations that can inform method
selection for specific use cases. This work advances the understanding of
structured output generation in RAG systems, offering both theoretical insights
and practical guidance for LLM deployment.