LLM의 요구사항 이해: 검색 증강 생성을 위한 이중 선호도 정렬

초록

검색 강화 생성(Retrieval-Augmented Generation, RAG)은 대규모 언어 모델(LLM)의 환각 문제를 완화하는 데 효과적임이 입증되었습니다. 그러나 다양한 LLM의 지식 선호도를 검색기와 정렬하는 어려움은 신뢰할 수 있는 RAG 시스템 개발에 필연적인 도전 과제로 남아 있습니다. 이 문제를 해결하기 위해, 우리는 RAG 시스템 내에서 다양한 지식 선호도를 정렬하기 위한 범용 프레임워크인 DPA-RAG를 제안합니다. 구체적으로, 우리는 먼저 선호 지식 구축 파이프라인을 도입하고, 선호 데이터 부족 문제를 완화하기 위해 다섯 가지 새로운 질의 확장 전략을 통합합니다. 선호 데이터를 기반으로, DPA-RAG는 외부 및 내부 선호도 정렬을 모두 달성합니다: 1) RAG 구성 요소 간의 외부 선호도 정렬을 위해, pairwise, pointwise, 그리고 contrastive 선호도 정렬 능력을 리랭커에 통합합니다. 2) 일반적인 지도 미세 조정(Supervised Fine-tuning, SFT) 이전에 사전 정렬 단계를 도입하여, LLM이 자신의 추론 선호도와 일치하는 지식을 암묵적으로 포착할 수 있도록 하여 LLM의 내부 정렬을 달성합니다. 네 가지 지식 집약적 QA 데이터셋에서의 실험 결과는 DPA-RAG가 모든 기준선을 능가하며, 블랙박스 및 오픈소스 LLM 리더를 원활하게 통합함을 보여줍니다. 추가적인 정성적 분석과 논의는 신뢰할 수 있는 RAG 시스템을 구축하기 위한 실질적인 지침을 제공합니다. 우리의 코드는 https://github.com/dongguanting/DPA-RAG에서 공개되어 있습니다.

English

Retrieval-augmented generation (RAG) has demonstrated effectiveness in mitigating the hallucination problem of large language models (LLMs). However, the difficulty of aligning the retriever with the diverse LLMs' knowledge preferences inevitably poses an inevitable challenge in developing a reliable RAG system. To address this issue, we propose DPA-RAG, a universal framework designed to align diverse knowledge preferences within RAG systems. Specifically, we initially introduce a preference knowledge construction pipline and incorporate five novel query augmentation strategies to alleviate preference data scarcity. Based on preference data, DPA-RAG accomplishes both external and internal preference alignment: 1) It jointly integrate pair-wise, point-wise, and contrastive preference alignment abilities into the reranker, achieving external preference alignment among RAG components. 2) It further introduces a pre-aligned stage before vanilla Supervised Fine-tuning (SFT), enabling LLMs to implicitly capture knowledge aligned with their reasoning preferences, achieving LLMs' internal alignment. Experimental results across four knowledge-intensive QA datasets demonstrate that DPA-RAG outperforms all baselines and seamlessly integrates both black-box and open-sourced LLM readers. Further qualitative analysis and discussions also provide empirical guidance for achieving reliable RAG systems. Our code is publicly available at https://github.com/dongguanting/DPA-RAG.

LLM의 요구사항 이해: 검색 증강 생성을 위한 이중 선호도 정렬

Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation

초록

Support