ChatPaper.aiChatPaper

基于区块链保障信源可靠性的去中心化检索增强生成系统

A Decentralized Retrieval Augmented Generation System with Source Reliabilities Secured on Blockchain

November 10, 2025
作者: Yining Lu, Wenyi Tang, Max Johnson, Taeho Jung, Meng Jiang
cs.AI

摘要

现有的检索增强生成(RAG)系统通常采用集中式架构,导致数据收集、整合和管理成本高昂,并引发隐私担忧。业界亟需一种去中心化RAG系统,使基础模型能够直接利用数据所有者控制的信息源,同时确保数据所有者对其资源保持完全控制权。然而,去中心化架构带来一项挑战:大量独立数据源的可靠性差异显著,可能降低检索精度和响应质量。为此,我们设计的去中心化RAG系统引入了一种创新的可靠性评分机制,该机制根据各数据源对生成响应的贡献质量进行动态评估,并在检索过程中优先调用高质量数据源。为确保透明度和可信度,评分流程通过基于区块链的智能合约进行安全管理,无需依赖中央机构即可生成可验证且防篡改的可靠性记录。我们采用两款Llama模型(3B和8B)在两种模拟环境中对系统进行评估,其中六个数据源具有不同等级的可靠性。在模拟真实世界不可靠数据环境时,本系统相较集中式系统实现了10.7%的性能提升。值得注意的是,在理想可靠数据环境下,其性能已接近集中式系统的理论上限。该去中心化基础设施实现了安全可信的评分管理,通过批量更新操作节省约56%的边际成本。我们的代码与系统已在github.com/yining610/Reliable-dRAG开源。
English
Existing retrieval-augmented generation (RAG) systems typically use a centralized architecture, causing a high cost of data collection, integration, and management, as well as privacy concerns. There is a great need for a decentralized RAG system that enables foundation models to utilize information directly from data owners who maintain full control over their sources. However, decentralization brings a challenge: the numerous independent data sources vary significantly in reliability, which can diminish retrieval accuracy and response quality. To address this, our decentralized RAG system has a novel reliability scoring mechanism that dynamically evaluates each source based on the quality of responses it contributes to generate and prioritizes high-quality sources during retrieval. To ensure transparency and trust, the scoring process is securely managed through blockchain-based smart contracts, creating verifiable and tamper-proof reliability records without relying on a central authority. We evaluate our decentralized system with two Llama models (3B and 8B) in two simulated environments where six data sources have different levels of reliability. Our system achieves a +10.7\% performance improvement over its centralized counterpart in the real world-like unreliable data environments. Notably, it approaches the upper-bound performance of centralized systems under ideally reliable data environments. The decentralized infrastructure enables secure and trustworthy scoring management, achieving approximately 56\% marginal cost savings through batched update operations. Our code and system are open-sourced at github.com/yining610/Reliable-dRAG.
PDF12December 1, 2025