邁向醫學人工智慧科學家

摘要

近年來，自主生成科學假設、執行實驗並撰寫論文的自動化系統已成為加速科學發現的新範式。然而現有的AI科學家大多仍屬領域無關型，難以應用於需要立足醫學證據且涉及專業數據模態的臨床醫學研究。為此，我們提出首個專注於臨床自主研究的框架——醫學AI科學家。該框架通過臨床醫師與工程師的協同推理機制，將文獻調研成果轉化為可操作的證據，實現臨床實證驅動的構思生成，並提升研究思路的可追溯性。同時，基於結構化醫學寫作規範與倫理準則，系統能完成證據導向的論文草擬。框架設有三種研究模式：文獻導向的複現、啟發式創新及任務驅動探索，分別對應自動化程度遞增的科學研究層級。經大型語言模型與人類專家對171個案例、19項臨床任務及6種數據模態的綜合評估，醫學AI科學家生成的研究思路質量顯著優於商用大語言模型。此外，系統在方法設計與實驗執行間呈現高度一致性，且可執行實驗的成功率明顯提升。雙盲評估顯示，由人類專家與斯坦福智能評審系統共同判定，本框架生成的論文質量接近MICCAI會議水平，並持續超越ISBI與BIBM會議論文質量。醫學AI科學家證實了人工智能在醫療領域實現自主科學發現的潛力。

English

Autonomous systems that generate scientific hypotheses, conduct experiments, and draft manuscripts have recently emerged as a promising paradigm for accelerating discovery. However, existing AI Scientists remain largely domain-agnostic, limiting their applicability to clinical medicine, where research is required to be grounded in medical evidence with specialized data modalities. In this work, we introduce Medical AI Scientist, the first autonomous research framework tailored to clinical autonomous research. It enables clinically grounded ideation by transforming extensively surveyed literature into actionable evidence through clinician-engineer co-reasoning mechanism, which improves the traceability of generated research ideas. It further facilitates evidence-grounded manuscript drafting guided by structured medical compositional conventions and ethical policies. The framework operates under 3 research modes, namely paper-based reproduction, literature-inspired innovation, and task-driven exploration, each corresponding to a distinct level of automated scientific inquiry with progressively increasing autonomy. Comprehensive evaluations by both large language models and human experts demonstrate that the ideas generated by the Medical AI Scientist are of substantially higher quality than those produced by commercial LLMs across 171 cases, 19 clinical tasks, and 6 data modalities. Meanwhile, our system achieves strong alignment between the proposed method and its implementation, while also demonstrating significantly higher success rates in executable experiments. Double-blind evaluations by human experts and the Stanford Agentic Reviewer suggest that the generated manuscripts approach MICCAI-level quality, while consistently surpassing those from ISBI and BIBM. The proposed Medical AI Scientist highlights the potential of leveraging AI for autonomous scientific discovery in healthcare.

邁向醫學人工智慧科學家

Towards a Medical AI Scientist

摘要

Support