ChatPaper.aiChatPaper

实现用于检索增强生成的通用指令遵循对齐

Toward General Instruction-Following Alignment for Retrieval-Augmented Generation

October 12, 2024
作者: Guanting Dong, Xiaoshuai Song, Yutao Zhu, Runqi Qiao, Zhicheng Dou, Ji-Rong Wen
cs.AI

摘要

为了有效应用检索增强生成(RAG)系统,遵循自然指令至关重要。尽管大语言模型(LLMs)近年来取得了进展,但在RAG领域评估和改进指令遵循(IF)对齐的研究仍然有限。为了解决这一问题,我们提出了VIF-RAG,这是用于RAG系统中指令遵循对齐的第一个自动化、可扩展和可验证的合成流水线。我们首先手工创建了一个最小集合的原子指令(<100),并制定组合规则来合成和验证种子集的复杂指令。然后,我们使用监督模型进行指令重写,同时生成代码以自动化验证指令质量,通过Python执行器。最后,我们将这些指令与广泛的RAG和通用数据样本集成,通过自动化流程扩展到一个高质量的VIF-RAG-QA数据集(>100k)。为了进一步弥合RAG系统中指令遵循自动评估的差距,我们引入了FollowRAG基准,其中包括约3K个测试样本,涵盖22个通用指令约束类别和四个知识密集型QA数据集。由于其健壮的流水线设计,FollowRAG可以无缝集成不同的RAG基准。利用FollowRAG和八个广泛使用的LLMs的IF和基础能力基准,我们展示了VIF-RAG显著提升了LLMs在广泛的通用指令约束范围内的性能,同时有效利用其在RAG场景中的能力。进一步的分析提供了实现RAG系统中IF对齐的实用见解。我们的代码和数据集已发布在https://FollowRAG.github.io。
English
Following natural instructions is crucial for the effective application of Retrieval-Augmented Generation (RAG) systems. Despite recent advancements in Large Language Models (LLMs), research on assessing and improving instruction-following (IF) alignment within the RAG domain remains limited. To address this issue, we propose VIF-RAG, the first automated, scalable, and verifiable synthetic pipeline for instruction-following alignment in RAG systems. We start by manually crafting a minimal set of atomic instructions (<100) and developing combination rules to synthesize and verify complex instructions for a seed set. We then use supervised models for instruction rewriting while simultaneously generating code to automate the verification of instruction quality via a Python executor. Finally, we integrate these instructions with extensive RAG and general data samples, scaling up to a high-quality VIF-RAG-QA dataset (>100k) through automated processes. To further bridge the gap in instruction-following auto-evaluation for RAG systems, we introduce FollowRAG Benchmark, which includes approximately 3K test samples, covering 22 categories of general instruction constraints and four knowledge-intensive QA datasets. Due to its robust pipeline design, FollowRAG can seamlessly integrate with different RAG benchmarks. Using FollowRAG and eight widely-used IF and foundational abilities benchmarks for LLMs, we demonstrate that VIF-RAG markedly enhances LLM performance across a broad range of general instruction constraints while effectively leveraging its capabilities in RAG scenarios. Further analysis offers practical insights for achieving IF alignment in RAG systems. Our code and datasets are released at https://FollowRAG.github.io.

Summary

AI-Generated Summary

PDF493November 16, 2024