UniversalRAG: 多様なモダリティと粒度を持つ複数コーパスにわたる検索拡張生成

要旨

Retrieval-Augmented Generation (RAG) は、クエリに関連する外部知識を基にモデルの応答を根拠付けることで、事実の正確性を大幅に向上させる可能性を示しています。しかし、既存のRAGアプローチのほとんどはテキストのみのコーパスに限定されており、最近の研究では画像や動画などの他のモダリティにRAGを拡張する試みがなされていますが、これらは通常、単一のモダリティ固有のコーパス上で動作します。一方、現実世界のクエリは必要とする知識の種類が多岐にわたるため、単一の知識源では対応できません。これを解決するため、我々はUniversalRAGを提案します。これは、異なるモダリティや粒度を持つ多様なソースから知識を検索し統合する新しいRAGフレームワークです。具体的には、すべてのモダリティを単一の結合コーパスから導出された統一表現空間に強制的に押し込むと、モダリティギャップが生じ、検索がクエリと同じモダリティのアイテムを優先しがちになるという観察に基づき、モダリティを意識したルーティング機構を提案します。この機構は、最も適切なモダリティ固有のコーパスを動的に特定し、その中でターゲットを絞った検索を行います。また、モダリティに加えて、各モダリティを複数の粒度レベルに組織化し、クエリの複雑さや範囲に応じたきめ細かい検索を可能にします。我々はUniversalRAGを複数のモダリティにまたがる8つのベンチマークで検証し、モダリティ固有および統一されたベースラインを上回る優位性を示しました。

English

Retrieval-Augmented Generation (RAG) has shown substantial promise in improving factual accuracy by grounding model responses with external knowledge relevant to queries. However, most existing RAG approaches are limited to a text-only corpus, and while recent efforts have extended RAG to other modalities such as images and videos, they typically operate over a single modality-specific corpus. In contrast, real-world queries vary widely in the type of knowledge they require, which a single type of knowledge source cannot address. To address this, we introduce UniversalRAG, a novel RAG framework designed to retrieve and integrate knowledge from heterogeneous sources with diverse modalities and granularities. Specifically, motivated by the observation that forcing all modalities into a unified representation space derived from a single combined corpus causes a modality gap, where the retrieval tends to favor items from the same modality as the query, we propose a modality-aware routing mechanism that dynamically identifies the most appropriate modality-specific corpus and performs targeted retrieval within it. Also, beyond modality, we organize each modality into multiple granularity levels, enabling fine-tuned retrieval tailored to the complexity and scope of the query. We validate UniversalRAG on 8 benchmarks spanning multiple modalities, showing its superiority over modality-specific and unified baselines.

UniversalRAG: 多様なモダリティと粒度を持つ複数コーパスにわたる検索拡張生成

UniversalRAG: Retrieval-Augmented Generation over Multiple Corpora with Diverse Modalities and Granularities

要旨

Support