朝向檢索增強生成的通用指令遵循對齊
Toward General Instruction-Following Alignment for Retrieval-Augmented Generation
October 12, 2024
作者: Guanting Dong, Xiaoshuai Song, Yutao Zhu, Runqi Qiao, Zhicheng Dou, Ji-Rong Wen
cs.AI
摘要
為了有效應用檢索增強生成(RAG)系統,遵循自然指令至關重要。儘管大型語言模型(LLMs)近年來取得了進展,但在評估和改進RAG領域內指令遵循(IF)對齊的研究仍然有限。為了解決這個問題,我們提出了VIF-RAG,這是第一個自動化、可擴展且可驗證的合成管道,用於RAG系統中指令遵循對齊。我們首先通過手工製作一組最小的原子指令(<100),並開發組合規則來綜合和驗證種子集的複雜指令。然後,我們使用監督模型進行指令重寫,同時生成代碼以自動化驗證指令質量,通過Python執行器。最後,我們將這些指令與廣泛的RAG和一般數據樣本相結合,通過自動化流程擴展到高質量的VIF-RAG-QA數據集(>100k)。為了進一步彌合RAG系統中指令遵循自動評估的差距,我們引入了FollowRAG基準,其中包括約3K個測試樣本,涵蓋了22個一般指令約束類別和四個知識密集型QA數據集。由於其堅固的管道設計,FollowRAG可以與不同的RAG基準無縫集成。通過使用FollowRAG和八個廣泛使用的LLMs的IF和基礎能力基準,我們展示了VIF-RAG在各種一般指令約束下顯著增強LLM性能,同時有效地利用其在RAG場景中的能力。進一步的分析提供了實現RAG系統中IF對齊的實用見解。我們的代碼和數據集已在https://FollowRAG.github.io 上發布。
English
Following natural instructions is crucial for the effective application of
Retrieval-Augmented Generation (RAG) systems. Despite recent advancements in
Large Language Models (LLMs), research on assessing and improving
instruction-following (IF) alignment within the RAG domain remains limited. To
address this issue, we propose VIF-RAG, the first automated, scalable, and
verifiable synthetic pipeline for instruction-following alignment in RAG
systems. We start by manually crafting a minimal set of atomic instructions
(<100) and developing combination rules to synthesize and verify complex
instructions for a seed set. We then use supervised models for instruction
rewriting while simultaneously generating code to automate the verification of
instruction quality via a Python executor. Finally, we integrate these
instructions with extensive RAG and general data samples, scaling up to a
high-quality VIF-RAG-QA dataset (>100k) through automated processes. To further
bridge the gap in instruction-following auto-evaluation for RAG systems, we
introduce FollowRAG Benchmark, which includes approximately 3K test samples,
covering 22 categories of general instruction constraints and four
knowledge-intensive QA datasets. Due to its robust pipeline design, FollowRAG
can seamlessly integrate with different RAG benchmarks. Using FollowRAG and
eight widely-used IF and foundational abilities benchmarks for LLMs, we
demonstrate that VIF-RAG markedly enhances LLM performance across a broad range
of general instruction constraints while effectively leveraging its
capabilities in RAG scenarios. Further analysis offers practical insights for
achieving IF alignment in RAG systems. Our code and datasets are released at
https://FollowRAG.github.io.Summary
AI-Generated Summary