検索拡張生成のための一般的な命令遵守に向けて

要旨

自然な指示に従うことは、検索増強生成（RAG）システムの効果的な適用にとって重要です。大規模言語モデル（LLM）の最近の進歩にもかかわらず、RAG領域内の指示に従う（IF）整合性を評価し改善する研究は限られています。この問題に対処するために、我々はVIF-RAGを提案します。これは、RAGシステム内の指示に従う整合性のための自動化された、スケーラブルで検証可能な合成パイプラインです。最初に、原子的な指示の最小セット（<100）を手作業で作成し、種子セットの複雑な指示を合成して検証するための組み合わせ規則を開発します。次に、監督モデルを使用して指示の書き換えを行い、同時にPython実行プログラムを使用して指示の品質を自動的に検証するコードを生成します。最後に、これらの指示を包括的なRAGおよび一般的なデータサンプルと統合し、自動プロセスを介して高品質のVIF-RAG-QAデータセット（>100k）にスケーリングします。RAGシステムの指示に従う自動評価のギャップをさらに埋めるために、約3Kのテストサンプルを含むFollowRAG Benchmarkを導入します。これは、一般的な指示制約の22カテゴリと4つの知識集約型QAデータセットをカバーしています。堅牢なパイプライン設計により、FollowRAGは異なるRAGベンチマークとシームレスに統合できます。FollowRAGとLLMの8つの広く使用されているIFおよび基本的な能力ベンチマークを使用して、VIF-RAGが幅広い一般的な指示制約においてLLMの性能を著しく向上させ、RAGシナリオでその能力を効果的に活用していることを示します。さらなる分析により、RAGシステムにおけるIF整合性の達成に向けた実用的な洞察が提供されます。当該コードおよびデータセットは、https://FollowRAG.github.io で公開されています。

English

Following natural instructions is crucial for the effective application of Retrieval-Augmented Generation (RAG) systems. Despite recent advancements in Large Language Models (LLMs), research on assessing and improving instruction-following (IF) alignment within the RAG domain remains limited. To address this issue, we propose VIF-RAG, the first automated, scalable, and verifiable synthetic pipeline for instruction-following alignment in RAG systems. We start by manually crafting a minimal set of atomic instructions (<100) and developing combination rules to synthesize and verify complex instructions for a seed set. We then use supervised models for instruction rewriting while simultaneously generating code to automate the verification of instruction quality via a Python executor. Finally, we integrate these instructions with extensive RAG and general data samples, scaling up to a high-quality VIF-RAG-QA dataset (>100k) through automated processes. To further bridge the gap in instruction-following auto-evaluation for RAG systems, we introduce FollowRAG Benchmark, which includes approximately 3K test samples, covering 22 categories of general instruction constraints and four knowledge-intensive QA datasets. Due to its robust pipeline design, FollowRAG can seamlessly integrate with different RAG benchmarks. Using FollowRAG and eight widely-used IF and foundational abilities benchmarks for LLMs, we demonstrate that VIF-RAG markedly enhances LLM performance across a broad range of general instruction constraints while effectively leveraging its capabilities in RAG scenarios. Further analysis offers practical insights for achieving IF alignment in RAG systems. Our code and datasets are released at https://FollowRAG.github.io.

検索拡張生成のための一般的な命令遵守に向けて

Toward General Instruction-Following Alignment for Retrieval-Augmented Generation

要旨

Support