ChatPaper.aiChatPaper

一种用于罕见病诊断的可追溯推理智能代理系统

An Agentic System for Rare Disease Diagnosis with Traceable Reasoning

June 25, 2025
作者: Weike Zhao, Chaoyi Wu, Yanjie Fan, Xiaoman Zhang, Pengcheng Qiu, Yuze Sun, Xiao Zhou, Yanfeng Wang, Ya Zhang, Yongguo Yu, Kun Sun, Weidi Xie
cs.AI

摘要

罕见病在全球范围内影响着超过3亿人,然而及时准确的诊断仍是一个普遍存在的挑战。这主要源于其临床异质性、个体发病率低以及大多数临床医生对罕见病认知有限。在此,我们推出DeepRare,这是首个基于大型语言模型(LLM)的罕见病诊断代理系统,能够处理异质性的临床输入。该系统为罕见病生成排序的诊断假设,每个假设都附有透明的推理链,将中间分析步骤与可验证的医学证据相连接。 DeepRare由三个关键组件构成:一个配备长期记忆模块的核心主机;负责特定领域分析任务的专用代理服务器,整合了超过40种专业工具和网络规模的、最新的医学知识源,确保访问最新的临床信息。这种模块化和可扩展的设计支持复杂的诊断推理,同时保持可追溯性和适应性。我们在八个数据集上对DeepRare进行了评估。该系统在2,919种疾病中展现出卓越的诊断性能,对1013种疾病达到了100%的准确率。在基于HPO的评估中,DeepRare显著优于其他15种方法,如传统的生物信息学诊断工具、LLM及其他代理系统,平均Recall@1得分为57.18%,比第二佳方法(推理LLM)高出23.79个百分点。在多模态输入场景下,DeepRare在109个案例中的Recall@1达到70.60%,而Exomiser为53.20%。临床专家对推理链的手动验证达成95.40%的一致率。此外,DeepRare系统已实现为一个用户友好的网络应用程序,网址为http://raredx.cn/doctor。
English
Rare diseases collectively affect over 300 million individuals worldwide, yet timely and accurate diagnosis remains a pervasive challenge. This is largely due to their clinical heterogeneity, low individual prevalence, and the limited familiarity most clinicians have with rare conditions. Here, we introduce DeepRare, the first rare disease diagnosis agentic system powered by a large language model (LLM), capable of processing heterogeneous clinical inputs. The system generates ranked diagnostic hypotheses for rare diseases, each accompanied by a transparent chain of reasoning that links intermediate analytic steps to verifiable medical evidence. DeepRare comprises three key components: a central host with a long-term memory module; specialized agent servers responsible for domain-specific analytical tasks integrating over 40 specialized tools and web-scale, up-to-date medical knowledge sources, ensuring access to the most current clinical information. This modular and scalable design enables complex diagnostic reasoning while maintaining traceability and adaptability. We evaluate DeepRare on eight datasets. The system demonstrates exceptional diagnostic performance among 2,919 diseases, achieving 100% accuracy for 1013 diseases. In HPO-based evaluations, DeepRare significantly outperforms other 15 methods, like traditional bioinformatics diagnostic tools, LLMs, and other agentic systems, achieving an average Recall@1 score of 57.18% and surpassing the second-best method (Reasoning LLM) by a substantial margin of 23.79 percentage points. For multi-modal input scenarios, DeepRare achieves 70.60% at Recall@1 compared to Exomiser's 53.20% in 109 cases. Manual verification of reasoning chains by clinical experts achieves 95.40% agreements. Furthermore, the DeepRare system has been implemented as a user-friendly web application http://raredx.cn/doctor.
PDF51June 27, 2025