RAG-Anything: Alles-in-één RAG-Framework

Samenvatting

Retrieval-Augmented Generation (RAG) is naar voren gekomen als een fundamenteel paradigma om Large Language Models te verruimen voorbij hun statische trainingsbeperkingen. Er bestaat echter een kritische mismatch tussen de huidige RAG-mogelijkheden en real-world informatieomgevingen. Moderne kennisrepositories zijn inherent multimodaal en bevatten rijke combinaties van tekstuele inhoud, visuele elementen, gestructureerde tabellen en wiskundige uitdrukkingen. Toch zijn bestaande RAG-frameworks beperkt tot tekstuele inhoud, wat fundamentele hiaten creëert bij het verwerken van multimodale documenten. Wij presenteren RAG-Anything, een uniform framework dat uitgebreide kennisretrieval over alle modaliteiten mogelijk maakt. Onze aanpak herconceptualiseert multimodale inhoud als onderling verbonden kennisentiteiten in plaats van geïsoleerde datatypes. Het framework introduceert dual-graph constructie om zowel cross-modale relaties als tekstuele semantiek vast te leggen binnen een uniforme representatie. We ontwikkelen cross-modale hybride retrieval die structurele kennismavigatie combineert met semantische matching. Dit maakt effectief redeneren over heterogene inhoud mogelijk, waarbij relevant bewijs zich over meerdere modaliteiten uitstrekt. RAG-Anything toont superieure prestaties op uitdagende multimodale benchmarks, met significante verbeteringen ten opzichte van state-of-the-art methoden. De prestatieverbeteringen worden vooral duidelijk bij lange documenten waar traditionele benaderingen falen. Ons framework vestigt een nieuw paradigma voor multimodale kennisaccess, waardoor de architectonische fragmentatie die huidige systemen beperkt, wordt geëlimineerd. Ons framework is open-source beschikbaar op: https://github.com/HKUDS/RAG-Anything.

English

Retrieval-Augmented Generation (RAG) has emerged as a fundamental paradigm for expanding Large Language Models beyond their static training limitations. However, a critical misalignment exists between current RAG capabilities and real-world information environments. Modern knowledge repositories are inherently multimodal, containing rich combinations of textual content, visual elements, structured tables, and mathematical expressions. Yet existing RAG frameworks are limited to textual content, creating fundamental gaps when processing multimodal documents. We present RAG-Anything, a unified framework that enables comprehensive knowledge retrieval across all modalities. Our approach reconceptualizes multimodal content as interconnected knowledge entities rather than isolated data types. The framework introduces dual-graph construction to capture both cross-modal relationships and textual semantics within a unified representation. We develop cross-modal hybrid retrieval that combines structural knowledge navigation with semantic matching. This enables effective reasoning over heterogeneous content where relevant evidence spans multiple modalities. RAG-Anything demonstrates superior performance on challenging multimodal benchmarks, achieving significant improvements over state-of-the-art methods. Performance gains become particularly pronounced on long documents where traditional approaches fail. Our framework establishes a new paradigm for multimodal knowledge access, eliminating the architectural fragmentation that constrains current systems. Our framework is open-sourced at: https://github.com/HKUDS/RAG-Anything.

RAG-Anything: Alles-in-één RAG-Framework

RAG-Anything: All-in-One RAG Framework

Samenvatting

Support