ChatPaper.aiChatPaper

ViDoRe V3:复杂现实场景下检索增强生成技术的全面评估

ViDoRe V3: A Comprehensive Evaluation of Retrieval Augmented Generation in Complex Real-World Scenarios

January 13, 2026
作者: António Loison, Quentin Macé, Antoine Edy, Victor Xing, Tom Balough, Gabriel Moreira, Bo Liu, Manuel Faysse, Céline Hudelot, Gautier Viaud
cs.AI

摘要

检索增强生成(RAG)管道需应对超越简单单文档检索的挑战,例如解析视觉元素(表格、图表、图像)、跨文档信息融合以及提供精准的溯源依据。现有基准测试未能涵盖这种复杂性,往往局限于文本数据、单文档理解或割裂评估检索与生成环节。我们推出第三代视觉文档检索增强基准ViDoRe v3,该基准具备对视觉丰富文档集的多类型查询功能,覆盖10个专业领域的数据集,包含约2.6万页文档与3,099条人工校验的查询项,支持6种语言。通过1.2万小时的人工标注工作,我们为检索相关性、边界框定位及验证参考答案提供了高质量标注。对前沿RAG管道的评估表明:视觉检索器优于文本检索器,延迟交互模型与文本重排能显著提升性能,混合或纯视觉上下文可改善答案生成质量。然而现有模型仍存在非文本元素处理、开放式查询和细粒度视觉定位的不足。为促进相关挑战的攻关,本基准已通过商业友好许可发布于https://hf.co/vidore。
English
Retrieval-Augmented Generation (RAG) pipelines must address challenges beyond simple single-document retrieval, such as interpreting visual elements (tables, charts, images), synthesizing information across documents, and providing accurate source grounding. Existing benchmarks fail to capture this complexity, often focusing on textual data, single-document comprehension, or evaluating retrieval and generation in isolation. We introduce ViDoRe v3, a comprehensive multimodal RAG benchmark featuring multi-type queries over visually rich document corpora. It covers 10 datasets across diverse professional domains, comprising ~26,000 document pages paired with 3,099 human-verified queries, each available in 6 languages. Through 12,000 hours of human annotation effort, we provide high-quality annotations for retrieval relevance, bounding box localization, and verified reference answers. Our evaluation of state-of-the-art RAG pipelines reveals that visual retrievers outperform textual ones, late-interaction models and textual reranking substantially improve performance, and hybrid or purely visual contexts enhance answer generation quality. However, current models still struggle with non-textual elements, open-ended queries, and fine-grained visual grounding. To encourage progress in addressing these challenges, the benchmark is released under a commercially permissive license at https://hf.co/vidore.
PDF71January 15, 2026