ChatPaper.aiChatPaper

自我RAG:透過自我反思學習檢索、生成和評論

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

October 17, 2023
作者: Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, Hannaneh Hajishirzi
cs.AI

摘要

儘管大型語言模型(LLMs)具有卓越的能力,但由於完全依賴其所包含的參數化知識,常常會產生包含事實錯誤的回應。檢索增強生成(RAG)是一種臨時方法,通過檢索相關知識來增強LM,從而減少此類問題。然而,無差別地檢索並合併固定數量的檢索段落,無論檢索是否必要,或段落是否相關,都會降低LM的多功能性,或導致無益的回應生成。我們引入了一個名為自我反思檢索增強生成(Self-RAG)的新框架,通過檢索和自我反思來增強LM的質量和事實性。我們的框架訓練一個單一的任意LM,可以根據需求自適應地檢索段落,並使用稱為反思標記的特殊標記生成和反思檢索段落及其自身生成。生成反思標記使LM在推論階段可控,使其能夠根據不同的任務需求調整其行為。實驗表明,Self-RAG(7B和13B參數)在各種任務上顯著優於最先進的LLMs和檢索增強模型。具體而言,Self-RAG在開放領域QA、推理和事實驗證任務上優於ChatGPT和檢索增強的Llama2-chat,並且在提高長篇生成的事實性和引文準確性方面相對於這些模型表現出顯著增益。
English
Despite their remarkable capabilities, large language models (LLMs) often produce responses containing factual inaccuracies due to their sole reliance on the parametric knowledge they encapsulate. Retrieval-Augmented Generation (RAG), an ad hoc approach that augments LMs with retrieval of relevant knowledge, decreases such issues. However, indiscriminately retrieving and incorporating a fixed number of retrieved passages, regardless of whether retrieval is necessary, or passages are relevant, diminishes LM versatility or can lead to unhelpful response generation. We introduce a new framework called Self-Reflective Retrieval-Augmented Generation (Self-RAG) that enhances an LM's quality and factuality through retrieval and self-reflection. Our framework trains a single arbitrary LM that adaptively retrieves passages on-demand, and generates and reflects on retrieved passages and its own generations using special tokens, called reflection tokens. Generating reflection tokens makes the LM controllable during the inference phase, enabling it to tailor its behavior to diverse task requirements. Experiments show that Self-RAG (7B and 13B parameters) significantly outperforms state-of-the-art LLMs and retrieval-augmented models on a diverse set of tasks. Specifically, Self-RAG outperforms ChatGPT and retrieval-augmented Llama2-chat on Open-domain QA, reasoning and fact verification tasks, and it shows significant gains in improving factuality and citation accuracy for long-form generations relative to these models.
PDF786December 15, 2024