迈向医学人工智能科学家

摘要

近年来，能够自主生成科学假说、开展实验并撰写论文的智能系统，已成为加速科学发现的新兴范式。然而现有AI科学家大多缺乏领域特异性，限制了其在临床医学中的应用——该领域研究需以医学证据为基础且涉及专业数据模态。本研究提出医学AI科学家，首个面向临床自主研究的专用框架。该框架通过临床医生与工程师的协同推理机制，将系统梳理的文献转化为可操作的证据，实现临床扎根的创意生成，并提升研究思路的可追溯性。在此基础上，系统依据结构化医学写作规范与伦理准则，完成证据导向的论文撰写。该框架支持三种研究模式：文献驱动复现、灵感引导创新及任务导向探索，分别对应自动化程度递增的科学探究层级。基于171个案例、19项临床任务和6种数据模态的综合评估表明，医学AI科学家生成的创意质量显著优于商用大语言模型。同时，本系统实现了方法设计与实施的高度契合，在可执行实验中展现出显著更高的成功率。双盲评审显示，生成论文质量接近MICCAI会议水平，且持续优于ISBI与BIBM会议论文。医学AI科学家的提出，彰显了人工智能在医疗健康领域实现自主科学发现的巨大潜力。

English

Autonomous systems that generate scientific hypotheses, conduct experiments, and draft manuscripts have recently emerged as a promising paradigm for accelerating discovery. However, existing AI Scientists remain largely domain-agnostic, limiting their applicability to clinical medicine, where research is required to be grounded in medical evidence with specialized data modalities. In this work, we introduce Medical AI Scientist, the first autonomous research framework tailored to clinical autonomous research. It enables clinically grounded ideation by transforming extensively surveyed literature into actionable evidence through clinician-engineer co-reasoning mechanism, which improves the traceability of generated research ideas. It further facilitates evidence-grounded manuscript drafting guided by structured medical compositional conventions and ethical policies. The framework operates under 3 research modes, namely paper-based reproduction, literature-inspired innovation, and task-driven exploration, each corresponding to a distinct level of automated scientific inquiry with progressively increasing autonomy. Comprehensive evaluations by both large language models and human experts demonstrate that the ideas generated by the Medical AI Scientist are of substantially higher quality than those produced by commercial LLMs across 171 cases, 19 clinical tasks, and 6 data modalities. Meanwhile, our system achieves strong alignment between the proposed method and its implementation, while also demonstrating significantly higher success rates in executable experiments. Double-blind evaluations by human experts and the Stanford Agentic Reviewer suggest that the generated manuscripts approach MICCAI-level quality, while consistently surpassing those from ISBI and BIBM. The proposed Medical AI Scientist highlights the potential of leveraging AI for autonomous scientific discovery in healthcare.