CoRAG: 협업형 검색 증강 생성

초록

검색 강화 생성(Retrieval-Augmented Generation, RAG) 모델은 특히 소수 샷 학습 제약 하에서 지식 집약적 작업에서 뛰어난 성능을 보입니다. 본 연구에서는 RAG를 협업 환경으로 확장한 CoRAG 프레임워크를 소개합니다. 이 프레임워크에서는 클라이언트들이 협업적 문서 저장소를 활용하여 공유 모델을 공동으로 학습합니다. CoRAG의 성능을 평가하기 위해, 우리는 협업적 동종 개방형 도메인 질의응답을 위한 벤치마크인 CRAB을 도입했습니다. 실험 결과, CoRAG는 저자원 시나리오에서 매개변수 기반 협업 학습 방법과 지역적으로 학습된 RAG 모델 모두를 지속적으로 능가하는 것으로 나타났습니다. 추가 분석을 통해 공유 저장소 내 관련 문서의 중요성, 관련 없는 문서를 포함했을 때의 의외의 이점, 그리고 하드 네거티브가 성능에 미칠 수 있는 부정적인 영향 등을 확인했습니다. 이는 협업적 RAG에서 새로운 고려 사항을 제기합니다: 즉, 집단적으로 풍부해진 지식 기반을 활용하는 것과 다른 클라이언트로부터 유해한 문서를 포함할 가능성 사이의 균형 문제입니다. 본 연구 결과는 CoRAG의 실현 가능성을 강조하는 동시에 주요 설계 과제와 향후 연구를 위한 유망한 방향을 제시합니다.

English

Retrieval-Augmented Generation (RAG) models excel in knowledge-intensive tasks, especially under few-shot learning constraints. We introduce CoRAG, a framework extending RAG to collaborative settings, where clients jointly train a shared model using a collaborative passage store. To evaluate CoRAG, we introduce CRAB, a benchmark for collaborative homogeneous open-domain question answering. Our experiments demonstrate that CoRAG consistently outperforms both parametric collaborative learning methods and locally trained RAG models in low-resource scenarios. Further analysis reveals the critical importance of relevant passages within the shared store, the surprising benefits of incorporating irrelevant passages, and the potential for hard negatives to negatively impact performance. This introduces a novel consideration in collaborative RAG: the trade-off between leveraging a collectively enriched knowledge base and the potential risk of incorporating detrimental passages from other clients. Our findings underscore the viability of CoRAG, while also highlighting key design challenges and promising avenues for future research.

CoRAG: 협업형 검색 증강 생성

CoRAG: Collaborative Retrieval-Augmented Generation

초록

Support