自我RAG:通过自我反思学习检索、生成和评论
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
October 17, 2023
作者: Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, Hannaneh Hajishirzi
cs.AI
摘要
尽管大型语言模型(LLMs)具有显著的能力,但由于完全依赖其所包含的参数化知识,通常会产生包含事实不准确性的回复。检索增强生成(RAG)是一种临时方法,通过检索相关知识来增强LM,从而减少这些问题。然而,不加区分地检索和合并固定数量的检索段落,无论检索是否必要或段落是否相关,都会降低LM的多功能性,或导致无益的响应生成。我们引入了一种名为自我反思检索增强生成(Self-RAG)的新框架,通过检索和自我反思来增强LM的质量和事实性。我们的框架训练一个单一的任意LM,可以自适应地按需检索段落,并使用称为反思标记的特殊标记生成和反思检索的段落及其自身生成。生成反思标记使LM在推理阶段可控,使其能够根据不同的任务需求调整其行为。实验表明,Self-RAG(7B和13B参数)在各种任务上明显优于最先进的LLMs和检索增强模型。具体而言,Self-RAG在开放领域QA、推理和事实验证任务上优于ChatGPT和检索增强的Llama2-chat,并且在提高长篇生成的事实性和引用准确性方面相对于这些模型显示出显著的增益。
English
Despite their remarkable capabilities, large language models (LLMs) often
produce responses containing factual inaccuracies due to their sole reliance on
the parametric knowledge they encapsulate. Retrieval-Augmented Generation
(RAG), an ad hoc approach that augments LMs with retrieval of relevant
knowledge, decreases such issues. However, indiscriminately retrieving and
incorporating a fixed number of retrieved passages, regardless of whether
retrieval is necessary, or passages are relevant, diminishes LM versatility or
can lead to unhelpful response generation. We introduce a new framework called
Self-Reflective Retrieval-Augmented Generation (Self-RAG) that enhances an LM's
quality and factuality through retrieval and self-reflection. Our framework
trains a single arbitrary LM that adaptively retrieves passages on-demand, and
generates and reflects on retrieved passages and its own generations using
special tokens, called reflection tokens. Generating reflection tokens makes
the LM controllable during the inference phase, enabling it to tailor its
behavior to diverse task requirements. Experiments show that Self-RAG (7B and
13B parameters) significantly outperforms state-of-the-art LLMs and
retrieval-augmented models on a diverse set of tasks. Specifically, Self-RAG
outperforms ChatGPT and retrieval-augmented Llama2-chat on Open-domain QA,
reasoning and fact verification tasks, and it shows significant gains in
improving factuality and citation accuracy for long-form generations relative
to these models.