가이디드 디코딩과 검색 증강 생성에서의 핵심 역할

초록

대규모 언어 모델(LLM)의 다양한 애플리케이션 통합은 구조적이고 신뢰할 수 있는 응답의 필요성을 촉진시켰습니다. 검색 증강 생성(RAG) 시스템에서의 주요 과제는 출력이 예상 형식에 부합하면서도 환각(hallucination)을 최소화하는 것입니다. 본 연구는 RAG 시스템에서 가이디드 디코딩(guided decoding)의 역할을 조사하며, 아웃라인(Outlines), XGrammar, LM Format Enforcer라는 세 가지 방법을 다양한 다중 턴 프롬프트 설정(0-턴, 1-턴, 2-턴)에서 비교합니다. 성공률, 환각률 및 출력 품질을 평가함으로써 이들의 성능과 적용 가능성에 대한 통찰을 제공합니다. 연구 결과는 다중 턴 상호작용이 가이디드 디코딩에 미치는 영향을 밝히고, 특정 사용 사례에 대한 방법 선택에 도움을 줄 수 있는 예상치 못한 성능 변동을 발견합니다. 이 연구는 RAG 시스템에서의 구조적 출력 생성에 대한 이해를 진전시키며, LLM 배포를 위한 이론적 통찰과 실질적인 지침을 제공합니다.

English

The integration of Large Language Models (LLMs) into various applications has driven the need for structured and reliable responses. A key challenge in Retrieval-Augmented Generation (RAG) systems is ensuring that outputs align with expected formats while minimizing hallucinations. This study examines the role of guided decoding in RAG systems, comparing three methods, Outlines, XGrammar, and LM Format Enforcer, across different multi-turn prompting setups (0-turn, 1-turn, and 2-turn). By evaluating success rates, hallucination rates, and output quality, we provide insights into their performance and applicability. Our findings reveal how multi-turn interactions influence guided decoding, uncovering unexpected performance variations that can inform method selection for specific use cases. This work advances the understanding of structured output generation in RAG systems, offering both theoretical insights and practical guidance for LLM deployment.

가이디드 디코딩과 검색 증강 생성에서의 핵심 역할

Guided Decoding and Its Critical Role in Retrieval-Augmented Generation

초록

Support