Self-RAG: 자기 성찰을 통한 검색, 생성 및 비평 학습

초록

놀라운 능력을 갖추고 있음에도 불구하고, 대규모 언어 모델(LLM)은 그들이 내재하고 있는 파라미터 지식에만 의존하기 때문에 사실과 부합하지 않는 응답을 생성하는 경우가 종종 있습니다. 관련 지식을 검색하여 언어 모델을 보강하는 임시적인 접근 방식인 검색 증강 생성(Retrieval-Augmented Generation, RAG)은 이러한 문제를 줄여줍니다. 그러나 검색이 필요한지 여부나 검색된 문단이 관련성이 있는지와 상관없이 일정한 수의 문단을 무차별적으로 검색하고 통합하는 것은 언어 모델의 다양성을 감소시키거나 도움이 되지 않는 응답 생성을 초래할 수 있습니다. 우리는 검색과 자기 반영을 통해 언어 모델의 품질과 사실성을 향상시키는 새로운 프레임워크인 자기 반영적 검색 증강 생성(Self-Reflective Retrieval-Augmented Generation, Self-RAG)을 소개합니다. 우리의 프레임워크는 필요에 따라 적응적으로 문단을 검색하고, 검색된 문단과 자신의 생성물을 반영 토큰(reflection tokens)이라는 특수 토큰을 사용하여 생성하고 반영하는 단일의 임의 언어 모델을 학습시킵니다. 반영 토큰을 생성함으로써 추론 단계에서 언어 모델을 제어할 수 있게 되어 다양한 작업 요구 사항에 맞게 행동을 조정할 수 있습니다. 실험 결과, Self-RAG(7B 및 13B 파라미터)는 다양한 작업에서 최첨단 대규모 언어 모델과 검색 증강 모델을 크게 능가하는 것으로 나타났습니다. 특히, Self-RAG는 개방형 질의응답, 추론 및 사실 확인 작업에서 ChatGPT와 검색 증강 Llama2-chat을 능가하며, 이러한 모델들에 비해 장문 생성에서 사실성과 인용 정확성을 크게 향상시키는 것으로 나타났습니다.

English

Despite their remarkable capabilities, large language models (LLMs) often produce responses containing factual inaccuracies due to their sole reliance on the parametric knowledge they encapsulate. Retrieval-Augmented Generation (RAG), an ad hoc approach that augments LMs with retrieval of relevant knowledge, decreases such issues. However, indiscriminately retrieving and incorporating a fixed number of retrieved passages, regardless of whether retrieval is necessary, or passages are relevant, diminishes LM versatility or can lead to unhelpful response generation. We introduce a new framework called Self-Reflective Retrieval-Augmented Generation (Self-RAG) that enhances an LM's quality and factuality through retrieval and self-reflection. Our framework trains a single arbitrary LM that adaptively retrieves passages on-demand, and generates and reflects on retrieved passages and its own generations using special tokens, called reflection tokens. Generating reflection tokens makes the LM controllable during the inference phase, enabling it to tailor its behavior to diverse task requirements. Experiments show that Self-RAG (7B and 13B parameters) significantly outperforms state-of-the-art LLMs and retrieval-augmented models on a diverse set of tasks. Specifically, Self-RAG outperforms ChatGPT and retrieval-augmented Llama2-chat on Open-domain QA, reasoning and fact verification tasks, and it shows significant gains in improving factuality and citation accuracy for long-form generations relative to these models.

Self-RAG: 자기 성찰을 통한 검색, 생성 및 비평 학습

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

초록

Support