医療AI科学者を目指して

要旨

科学的仮説の生成、実験の実施、論文草稿の作成を自律的に行うシステムは、発見を加速する有望なパラダイムとして近年登場している。しかし、既存のAI Scientistはドメイン非依存な性質が強く、医学的根拠と特殊なデータモダリティに基づくことが要求される臨床医学への適用性が制限されている。本研究では、臨床自律研究に特化した初の自律的研究フレームワーク「Medical AI Scientist」を提案する。本フレームワークは、臨床医とエンジニアの共同推論メカニズムを通じて、網羅的に調査された文献を実践可能な根拠に変換することで、臨床的に裏打ちされたアイデア創出を可能にし、生成される研究アイデアの追跡可能性を向上させる。さらに、構造化された医学的構成慣例と倫理方針に基づき、証拠に裏打ちされた論文草稿の作成を促進する。本フレームワークは3つの研究モード、すなわち論文ベースの再現、文献に着想を得た革新、タスク駆動型探索で動作し、それぞれが自律性の度合いが段階的に増加する異なるレベルの自動化された科学的探求に対応する。大規模言語モデルと人間専門家による包括的評価により、Medical AI Scientistが生成するアイデアは、171症例、19の臨床タスク、6つのデータモダリティにわたって、市販LLMが生成するアイデアよりも実質的に高い品質であることが実証された。同時に、本システムは提案手法とその実装の強い一貫性を達成し、実行可能な実験において著しく高い成功率を示す。人間専門家とStanford Agentic Reviewerによる二重盲検評価では、生成される論文草稿はMICCAIレベルの品質に迫りながら、一貫してISBIおよびBIBMの論文を凌駕することが示された。提案するMedical AI Scientistは、医療分野における自律的科学発見のためのAI活用の可能性を浮き彫りにするものである。

English

Autonomous systems that generate scientific hypotheses, conduct experiments, and draft manuscripts have recently emerged as a promising paradigm for accelerating discovery. However, existing AI Scientists remain largely domain-agnostic, limiting their applicability to clinical medicine, where research is required to be grounded in medical evidence with specialized data modalities. In this work, we introduce Medical AI Scientist, the first autonomous research framework tailored to clinical autonomous research. It enables clinically grounded ideation by transforming extensively surveyed literature into actionable evidence through clinician-engineer co-reasoning mechanism, which improves the traceability of generated research ideas. It further facilitates evidence-grounded manuscript drafting guided by structured medical compositional conventions and ethical policies. The framework operates under 3 research modes, namely paper-based reproduction, literature-inspired innovation, and task-driven exploration, each corresponding to a distinct level of automated scientific inquiry with progressively increasing autonomy. Comprehensive evaluations by both large language models and human experts demonstrate that the ideas generated by the Medical AI Scientist are of substantially higher quality than those produced by commercial LLMs across 171 cases, 19 clinical tasks, and 6 data modalities. Meanwhile, our system achieves strong alignment between the proposed method and its implementation, while also demonstrating significantly higher success rates in executable experiments. Double-blind evaluations by human experts and the Stanford Agentic Reviewer suggest that the generated manuscripts approach MICCAI-level quality, while consistently surpassing those from ISBI and BIBM. The proposed Medical AI Scientist highlights the potential of leveraging AI for autonomous scientific discovery in healthcare.

医療AI科学者を目指して

Towards a Medical AI Scientist

要旨

Support