ChatPaper.aiChatPaper

RAG-萬物:全方位RAG框架

RAG-Anything: All-in-One RAG Framework

October 14, 2025
作者: Zirui Guo, Xubin Ren, Lingrui Xu, Jiahao Zhang, Chao Huang
cs.AI

摘要

檢索增強生成(Retrieval-Augmented Generation, RAG)已成為突破大型語言模型靜態訓練限制的基本範式。然而,現有RAG能力與現實世界信息環境之間存在顯著的不匹配。現代知識庫本質上是多模態的,包含豐富的文本內容、視覺元素、結構化表格和數學表達式的組合。然而,現有的RAG框架僅限於文本內容,在處理多模態文檔時存在根本性缺陷。我們提出了RAG-Anything,這是一個統一框架,能夠實現跨所有模態的全面知識檢索。我們的方法將多模態內容重新概念化為相互關聯的知識實體,而非孤立的數據類型。該框架引入了雙圖構建,以在統一表示中捕捉跨模態關係和文本語義。我們開發了跨模態混合檢索,結合了結構化知識導航與語義匹配。這使得在相關證據跨越多個模態的異質內容上進行有效推理成為可能。RAG-Anything在具有挑戰性的多模態基準測試中表現出卓越性能,相較於最先進的方法取得了顯著提升。在傳統方法失效的長文檔上,性能提升尤為明顯。我們的框架為多模態知識訪問建立了新範式,消除了制約當前系統的架構碎片化問題。我們的框架已開源於:https://github.com/HKUDS/RAG-Anything。
English
Retrieval-Augmented Generation (RAG) has emerged as a fundamental paradigm for expanding Large Language Models beyond their static training limitations. However, a critical misalignment exists between current RAG capabilities and real-world information environments. Modern knowledge repositories are inherently multimodal, containing rich combinations of textual content, visual elements, structured tables, and mathematical expressions. Yet existing RAG frameworks are limited to textual content, creating fundamental gaps when processing multimodal documents. We present RAG-Anything, a unified framework that enables comprehensive knowledge retrieval across all modalities. Our approach reconceptualizes multimodal content as interconnected knowledge entities rather than isolated data types. The framework introduces dual-graph construction to capture both cross-modal relationships and textual semantics within a unified representation. We develop cross-modal hybrid retrieval that combines structural knowledge navigation with semantic matching. This enables effective reasoning over heterogeneous content where relevant evidence spans multiple modalities. RAG-Anything demonstrates superior performance on challenging multimodal benchmarks, achieving significant improvements over state-of-the-art methods. Performance gains become particularly pronounced on long documents where traditional approaches fail. Our framework establishes a new paradigm for multimodal knowledge access, eliminating the architectural fragmentation that constrains current systems. Our framework is open-sourced at: https://github.com/HKUDS/RAG-Anything.
PDF365October 15, 2025