CORAL: マルチターン会話検索拡張生成のベンチマーキング

要旨

Retrieval-Augmented Generation（RAG）は、外部知識検索を通じて大規模言語モデル（LLMs）を強化するための強力なパラダイムとなっています。広く注目されているにもかかわらず、既存の学術研究は主に単一ターンのRAGに焦点を当てており、実世界の応用で見られるマルチターンの会話の複雑さに対処するための重要な課題が未解決のままです。この課題を克服するために、我々はCORALを導入します。これは、現実的なマルチターンの会話設定でRAGシステムを評価するために設計された大規模なベンチマークです。CORALには、Wikipediaから自動的に導出された多様な情報検索会話が含まれており、オープンドメインのカバレッジ、知識密度、自由形式の応答、トピックの変化などの主要な課題に取り組んでいます。会話型RAGの3つの主要タスク、つまりパッセージ検索、応答生成、および引用ラベリングをサポートしています。我々は、さまざまな会話型RAG手法を標準化するための統一フレームワークを提案し、これらの手法をCORALで包括的に評価することで、既存の手法を改善するための大きな機会を示しています。

English

Retrieval-Augmented Generation (RAG) has become a powerful paradigm for enhancing large language models (LLMs) through external knowledge retrieval. Despite its widespread attention, existing academic research predominantly focuses on single-turn RAG, leaving a significant gap in addressing the complexities of multi-turn conversations found in real-world applications. To bridge this gap, we introduce CORAL, a large-scale benchmark designed to assess RAG systems in realistic multi-turn conversational settings. CORAL includes diverse information-seeking conversations automatically derived from Wikipedia and tackles key challenges such as open-domain coverage, knowledge intensity, free-form responses, and topic shifts. It supports three core tasks of conversational RAG: passage retrieval, response generation, and citation labeling. We propose a unified framework to standardize various conversational RAG methods and conduct a comprehensive evaluation of these methods on CORAL, demonstrating substantial opportunities for improving existing approaches.

CORAL: マルチターン会話検索拡張生成のベンチマーキング

CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation

要旨

Support