ChatPaper.aiChatPaper

一個用於罕見疾病診斷的代理系統,具備可追溯的推理能力

An Agentic System for Rare Disease Diagnosis with Traceable Reasoning

June 25, 2025
作者: Weike Zhao, Chaoyi Wu, Yanjie Fan, Xiaoman Zhang, Pengcheng Qiu, Yuze Sun, Xiao Zhou, Yanfeng Wang, Ya Zhang, Yongguo Yu, Kun Sun, Weidi Xie
cs.AI

摘要

罕見疾病在全球範圍內影響超過3億人,然而及時且準確的診斷仍是一個普遍存在的挑戰。這主要歸因於其臨床異質性、個別發病率低以及大多數臨床醫生對罕見病症的熟悉度有限。在此,我們介紹DeepRare,這是首個由大型語言模型(LLM)驅動的罕見疾病診斷代理系統,能夠處理異質性的臨床輸入。該系統生成針對罕見疾病的排名診斷假設,每個假設均附有透明的推理鏈,將中間分析步驟與可驗證的醫學證據相聯繫。 DeepRare包含三個關鍵組件:一個帶有長期記憶模塊的中心主機;負責特定領域分析任務的專用代理服務器,整合了超過40種專用工具和網絡規模的最新醫學知識源,確保獲取最當前的臨床信息。這種模塊化和可擴展的設計使得複雜的診斷推理成為可能,同時保持可追溯性和適應性。我們在八個數據集上評估了DeepRare。該系統在2,919種疾病中展現出卓越的診斷性能,對1013種疾病達到了100%的準確率。在基於HPO的評估中,DeepRare顯著優於其他15種方法,如傳統的生物信息學診斷工具、LLM和其他代理系統,平均Recall@1得分為57.18%,並以23.79個百分點的顯著優勢超越次優方法(推理LLM)。在多模態輸入場景下,DeepRare在109個案例中Recall@1達到70.60%,而Exomiser為53.20%。臨床專家對推理鏈的手動驗證達到了95.40%的一致性。此外,DeepRare系統已作為一個用戶友好的網絡應用程序實現,網址為http://raredx.cn/doctor。
English
Rare diseases collectively affect over 300 million individuals worldwide, yet timely and accurate diagnosis remains a pervasive challenge. This is largely due to their clinical heterogeneity, low individual prevalence, and the limited familiarity most clinicians have with rare conditions. Here, we introduce DeepRare, the first rare disease diagnosis agentic system powered by a large language model (LLM), capable of processing heterogeneous clinical inputs. The system generates ranked diagnostic hypotheses for rare diseases, each accompanied by a transparent chain of reasoning that links intermediate analytic steps to verifiable medical evidence. DeepRare comprises three key components: a central host with a long-term memory module; specialized agent servers responsible for domain-specific analytical tasks integrating over 40 specialized tools and web-scale, up-to-date medical knowledge sources, ensuring access to the most current clinical information. This modular and scalable design enables complex diagnostic reasoning while maintaining traceability and adaptability. We evaluate DeepRare on eight datasets. The system demonstrates exceptional diagnostic performance among 2,919 diseases, achieving 100% accuracy for 1013 diseases. In HPO-based evaluations, DeepRare significantly outperforms other 15 methods, like traditional bioinformatics diagnostic tools, LLMs, and other agentic systems, achieving an average Recall@1 score of 57.18% and surpassing the second-best method (Reasoning LLM) by a substantial margin of 23.79 percentage points. For multi-modal input scenarios, DeepRare achieves 70.60% at Recall@1 compared to Exomiser's 53.20% in 109 cases. Manual verification of reasoning chains by clinical experts achieves 95.40% agreements. Furthermore, the DeepRare system has been implemented as a user-friendly web application http://raredx.cn/doctor.
PDF51June 27, 2025