RAG-Anything:一体化RAG框架
RAG-Anything: All-in-One RAG Framework
October 14, 2025
作者: Zirui Guo, Xubin Ren, Lingrui Xu, Jiahao Zhang, Chao Huang
cs.AI
摘要
检索增强生成(RAG)已成为突破大型语言模型静态训练限制的基础范式。然而,当前RAG能力与现实世界信息环境之间存在显著的不匹配。现代知识库本质上是多模态的,包含文本内容、视觉元素、结构化表格和数学表达式的丰富组合。然而,现有的RAG框架仅限于处理文本内容,在处理多模态文档时存在根本性缺陷。我们提出了RAG-Anything,一个统一框架,能够实现跨所有模态的全面知识检索。我们的方法将多模态内容重新概念化为相互关联的知识实体,而非孤立的数据类型。该框架引入了双图构建,以在统一表示中捕捉跨模态关系和文本语义。我们开发了跨模态混合检索,结合了结构化知识导航与语义匹配。这使得在相关证据跨越多个模态的异构内容上进行有效推理成为可能。RAG-Anything在具有挑战性的多模态基准测试中展现了卓越性能,相较于最先进方法取得了显著提升。在传统方法失效的长文档上,性能提升尤为显著。我们的框架为多模态知识访问建立了新范式,消除了当前系统所面临的架构碎片化问题。我们的框架已在以下地址开源:https://github.com/HKUDS/RAG-Anything。
English
Retrieval-Augmented Generation (RAG) has emerged as a fundamental paradigm
for expanding Large Language Models beyond their static training limitations.
However, a critical misalignment exists between current RAG capabilities and
real-world information environments. Modern knowledge repositories are
inherently multimodal, containing rich combinations of textual content, visual
elements, structured tables, and mathematical expressions. Yet existing RAG
frameworks are limited to textual content, creating fundamental gaps when
processing multimodal documents. We present RAG-Anything, a unified framework
that enables comprehensive knowledge retrieval across all modalities. Our
approach reconceptualizes multimodal content as interconnected knowledge
entities rather than isolated data types. The framework introduces dual-graph
construction to capture both cross-modal relationships and textual semantics
within a unified representation. We develop cross-modal hybrid retrieval that
combines structural knowledge navigation with semantic matching. This enables
effective reasoning over heterogeneous content where relevant evidence spans
multiple modalities. RAG-Anything demonstrates superior performance on
challenging multimodal benchmarks, achieving significant improvements over
state-of-the-art methods. Performance gains become particularly pronounced on
long documents where traditional approaches fail. Our framework establishes a
new paradigm for multimodal knowledge access, eliminating the architectural
fragmentation that constrains current systems. Our framework is open-sourced
at: https://github.com/HKUDS/RAG-Anything.