NodeRAG: Estruturando RAG Baseado em Grafos com Nós Heterogêneos

Resumo

A geração aumentada por recuperação (RAG) capacita grandes modelos de linguagem a acessar corpus externos e privados, permitindo respostas factualmente consistentes em domínios específicos. Ao explorar a estrutura inerente do corpus, os métodos RAG baseados em grafos enriquecem ainda mais esse processo, construindo um índice de grafo de conhecimento e aproveitando a natureza estrutural dos grafos. No entanto, as abordagens atuais de RAG baseadas em grafos raramente priorizam o design das estruturas de grafos. Grafos mal projetados não apenas impedem a integração perfeita de diversos algoritmos de grafos, mas também resultam em inconsistências no fluxo de trabalho e em desempenho degradado. Para liberar ainda mais o potencial dos grafos para RAG, propomos o NodeRAG, um framework centrado em grafos que introduz estruturas de grafos heterogêneas, permitindo a integração holística e contínua de metodologias baseadas em grafos no fluxo de trabalho RAG. Ao se alinhar de perto com as capacidades dos LLMs, esse framework garante um processo totalmente coeso e eficiente de ponta a ponta. Por meio de extensos experimentos, demonstramos que o NodeRAG apresenta vantagens de desempenho em relação a métodos anteriores, como GraphRAG e LightRAG, não apenas em tempo de indexação, tempo de consulta e eficiência de armazenamento, mas também na entrega de um desempenho superior em benchmarks de perguntas e respostas multi-hop e em avaliações abertas head-to-head com um número mínimo de tokens de recuperação. Nosso repositório no GitHub pode ser acessado em https://github.com/Terry-Xu-666/NodeRAG.

English

Retrieval-augmented generation (RAG) empowers large language models to access external and private corpus, enabling factually consistent responses in specific domains. By exploiting the inherent structure of the corpus, graph-based RAG methods further enrich this process by building a knowledge graph index and leveraging the structural nature of graphs. However, current graph-based RAG approaches seldom prioritize the design of graph structures. Inadequately designed graph not only impede the seamless integration of diverse graph algorithms but also result in workflow inconsistencies and degraded performance. To further unleash the potential of graph for RAG, we propose NodeRAG, a graph-centric framework introducing heterogeneous graph structures that enable the seamless and holistic integration of graph-based methodologies into the RAG workflow. By aligning closely with the capabilities of LLMs, this framework ensures a fully cohesive and efficient end-to-end process. Through extensive experiments, we demonstrate that NodeRAG exhibits performance advantages over previous methods, including GraphRAG and LightRAG, not only in indexing time, query time, and storage efficiency but also in delivering superior question-answering performance on multi-hop benchmarks and open-ended head-to-head evaluations with minimal retrieval tokens. Our GitHub repository could be seen at https://github.com/Terry-Xu-666/NodeRAG.

NodeRAG: Estruturando RAG Baseado em Grafos com Nós Heterogêneos

NodeRAG: Structuring Graph-based RAG with Heterogeneous Nodes

Resumo

Summary

Support

Support