의료 AI 과학자를 향하여

초록

과학적 가설 생성, 실험 수행, 원고 초안 작성까지 자율적으로 진행하는 자율 시스템이 최근 발견의 가속화를 위한 유망한 패러다임으로 부상하고 있습니다. 그러나 기존 AI 과학자들은 대부분 영역에 구애받지 않는(domain-agnostic) 방식으로 작동하여, 전문적인 데이터 양식과 의학적 증거에 기반해야 하는 임상 의학 연구 분야에의 적용이 제한되어 왔습니다. 본 연구에서는 임상 자율 연구에 특화된 최초의 자율 연구 프레임워크인 Medical AI Scientist를 소개합니다. 이 프레임워크는 임상의-엔지니어 공동 추론(co-reasoning) 메커니즘을 통해 광범위하게 조사된 문헌을 실행 가능한 증거로 변환함으로써 임상적으로 근거 있는 아이디어 도출을 가능하게 하며, 이는 생성된 연구 아이디어의 추적 가능성(traceability)을 향상시킵니다. 또한, 구조화된 의학적 논문 작성 규칙과 윤리 정책에 따라 증거에 기반한 원고 작성을 용이하게 합니다. 본 프레임워크는 논문 기반 재현(paper-based reproduction), 문헌 기반 혁신(literature-inspired innovation), 과제 주도 탐구(task-driven exploration)라는 3가지 연구 모드로 운영되며, 각 모드는 점차 증가하는 자율성 수준을 가진 서로 다른 수준의 자동화된 과학적 탐구에 대응합니다. 대규모 언어 모델(LLM)과 인간 전문가에 의한 포괄적인 평가 결과, Medical AI Scientist가 생성한 아이디어는 171개 사례, 19개 임상 과제, 6개 데이터 양식에 걸쳐 상용 LLM이 생성한 아이디어보다 질적으로 현저히 우수한 것으로 나타났습니다. 동시에, 본 시스템은 제안된 방법론과 그 구현 사이의 강력한 일치성을 달성했을 뿐만 아니라, 실행 가능한 실험에서도 유의미하게 높은 성공률을 보였습니다. 인간 전문가와 Stanford Agentic Reviewer에 의한 이중 맹검 평가 결과, 생성된 원고는 MICCAI 수준의 질에 근접하는 동시에, ISBI 및 BIBM 수준의 원고를 지속적으로 능가하는 것으로 나타났습니다. 제안된 Medical AI Scientist는 의료 분야에서 AI를 활용한 자율 과학 발견의 잠재력을 강조합니다.

English

Autonomous systems that generate scientific hypotheses, conduct experiments, and draft manuscripts have recently emerged as a promising paradigm for accelerating discovery. However, existing AI Scientists remain largely domain-agnostic, limiting their applicability to clinical medicine, where research is required to be grounded in medical evidence with specialized data modalities. In this work, we introduce Medical AI Scientist, the first autonomous research framework tailored to clinical autonomous research. It enables clinically grounded ideation by transforming extensively surveyed literature into actionable evidence through clinician-engineer co-reasoning mechanism, which improves the traceability of generated research ideas. It further facilitates evidence-grounded manuscript drafting guided by structured medical compositional conventions and ethical policies. The framework operates under 3 research modes, namely paper-based reproduction, literature-inspired innovation, and task-driven exploration, each corresponding to a distinct level of automated scientific inquiry with progressively increasing autonomy. Comprehensive evaluations by both large language models and human experts demonstrate that the ideas generated by the Medical AI Scientist are of substantially higher quality than those produced by commercial LLMs across 171 cases, 19 clinical tasks, and 6 data modalities. Meanwhile, our system achieves strong alignment between the proposed method and its implementation, while also demonstrating significantly higher success rates in executable experiments. Double-blind evaluations by human experts and the Stanford Agentic Reviewer suggest that the generated manuscripts approach MICCAI-level quality, while consistently surpassing those from ISBI and BIBM. The proposed Medical AI Scientist highlights the potential of leveraging AI for autonomous scientific discovery in healthcare.

의료 AI 과학자를 향하여

Towards a Medical AI Scientist

초록

Support