범용 생물의학 AI를 향하여

초록

의학은 본질적으로 다중 모달(multimodal)적 특성을 지니며, 텍스트, 영상, 유전체학 등 다양한 데이터 양식을 포함합니다. 이러한 데이터를 대규모로 유연하게 인코딩, 통합 및 해석할 수 있는 범용 생물의학 인공지능(AI) 시스템은 과학적 발견부터 진료 제공에 이르기까지 영향력 있는 응용 분야를 가능하게 할 잠재력을 가지고 있습니다. 이러한 모델의 개발을 가능하게 하기 위해, 우리는 먼저 새로운 다중 모달 생물의학 벤치마크인 MultiMedBench를 구축했습니다. MultiMedBench는 의학 질문 응답, 유방촬영 및 피부과 영상 해석, 방사선 보고서 생성 및 요약, 유전체 변이 식별 등 14가지 다양한 과제를 포함합니다. 그런 다음, 우리는 범용 생물의학 AI 시스템의 개념 증명인 Med-PaLM Multimodal(Med-PaLM M)을 소개합니다. Med-PaLM M은 임상 언어, 영상, 유전체학을 포함한 생물의학 데이터를 동일한 모델 가중치로 유연하게 인코딩하고 해석하는 대규모 다중 모달 생성 모델입니다. Med-PaLM M은 MultiMedBench의 모든 과제에서 최신 기술 수준과 경쟁하거나 이를 능가하는 성능을 보이며, 종종 전문가 모델을 크게 앞지릅니다. 또한, 새로운 의학 개념 및 과제에 대한 제로샷(zero-shot) 일반화, 과제 간의 긍정적 전이 학습, 그리고 제로샷 의학 추론의 출현 사례를 보고합니다. Med-PaLM M의 능력과 한계를 더 깊이 탐구하기 위해, 우리는 모델이 생성한(그리고 인간이 작성한) 흉부 X-선 보고서에 대한 방사선 전문가 평가를 수행하고, 다양한 모델 규모에서 고무적인 성능을 관찰했습니다. 246개의 과거 흉부 X-선 영상에 대한 병렬 순위 평가에서, 임상의들은 최대 40.50%의 사례에서 방사선 전문가가 작성한 보고서보다 Med-PaLM M 보고서를 선호하는 것으로 나타났으며, 이는 잠재적인 임상 유용성을 시사합니다. 이러한 모델을 실제 사용 사례에서 검증하기 위해서는 상당한 추가 작업이 필요하지만, 우리의 결과는 범용 생물의학 AI 시스템 개발을 위한 중요한 이정표를 나타냅니다.

English

Medicine is inherently multimodal, with rich data modalities spanning text, imaging, genomics, and more. Generalist biomedical artificial intelligence (AI) systems that flexibly encode, integrate, and interpret this data at scale can potentially enable impactful applications ranging from scientific discovery to care delivery. To enable the development of these models, we first curate MultiMedBench, a new multimodal biomedical benchmark. MultiMedBench encompasses 14 diverse tasks such as medical question answering, mammography and dermatology image interpretation, radiology report generation and summarization, and genomic variant calling. We then introduce Med-PaLM Multimodal (Med-PaLM M), our proof of concept for a generalist biomedical AI system. Med-PaLM M is a large multimodal generative model that flexibly encodes and interprets biomedical data including clinical language, imaging, and genomics with the same set of model weights. Med-PaLM M reaches performance competitive with or exceeding the state of the art on all MultiMedBench tasks, often surpassing specialist models by a wide margin. We also report examples of zero-shot generalization to novel medical concepts and tasks, positive transfer learning across tasks, and emergent zero-shot medical reasoning. To further probe the capabilities and limitations of Med-PaLM M, we conduct a radiologist evaluation of model-generated (and human) chest X-ray reports and observe encouraging performance across model scales. In a side-by-side ranking on 246 retrospective chest X-rays, clinicians express a pairwise preference for Med-PaLM M reports over those produced by radiologists in up to 40.50% of cases, suggesting potential clinical utility. While considerable work is needed to validate these models in real-world use cases, our results represent a milestone towards the development of generalist biomedical AI systems.

범용 생물의학 AI를 향하여

Towards Generalist Biomedical AI

초록

Support