汎用型バイオメディカルAIに向けて

要旨

医学は本質的にマルチモーダルであり、テキスト、画像、ゲノミクスなど多様なデータモダリティを包含しています。これらのデータを柔軟にエンコードし、統合し、大規模に解釈できる汎用型生物医学人工知能（AI）システムは、科学的発見から医療提供に至るまで、インパクトのある応用を可能にする潜在性を秘めています。これらのモデルの開発を可能にするため、我々はまず、新しいマルチモーダル生物医学ベンチマークであるMultiMedBenchを構築しました。MultiMedBenchは、医療質問応答、マンモグラフィーおよび皮膚科画像解釈、放射線レポート生成と要約、ゲノムバリアントコーリングなど、14の多様なタスクを網羅しています。次に、汎用型生物医学AIシステムの概念実証として、Med-PaLM Multimodal（Med-PaLM M）を紹介します。Med-PaLM Mは、臨床言語、画像、ゲノミクスを含む生物医学データを同一のモデル重みで柔軟にエンコードし、解釈する大規模マルチモーダル生成モデルです。Med-PaLM Mは、MultiMedBenchのすべてのタスクにおいて、最先端の性能に匹敵するかそれを上回る結果を示し、しばしば専門家モデルを大きく凌駕します。また、新しい医学的概念やタスクへのゼロショット汎化、タスク間のポジティブな転移学習、そして創発的なゼロショット医療推論の例も報告しています。Med-PaLM Mの能力と限界をさらに探るため、我々は放射線科医によるモデル生成（および人間による）胸部X線レポートの評価を実施し、モデルスケール全体で有望な性能を観察しました。246件の過去の胸部X線画像に対する並列ランキングでは、臨床医がMed-PaLM Mのレポートを放射線科医のレポートよりも好むケースが最大40.50%に上り、臨床的有用性の可能性を示唆しています。これらのモデルを実世界のユースケースで検証するためにはまだ多くの作業が必要ですが、我々の結果は、汎用型生物医学AIシステムの開発に向けた重要なマイルストーンを表しています。

English

Medicine is inherently multimodal, with rich data modalities spanning text, imaging, genomics, and more. Generalist biomedical artificial intelligence (AI) systems that flexibly encode, integrate, and interpret this data at scale can potentially enable impactful applications ranging from scientific discovery to care delivery. To enable the development of these models, we first curate MultiMedBench, a new multimodal biomedical benchmark. MultiMedBench encompasses 14 diverse tasks such as medical question answering, mammography and dermatology image interpretation, radiology report generation and summarization, and genomic variant calling. We then introduce Med-PaLM Multimodal (Med-PaLM M), our proof of concept for a generalist biomedical AI system. Med-PaLM M is a large multimodal generative model that flexibly encodes and interprets biomedical data including clinical language, imaging, and genomics with the same set of model weights. Med-PaLM M reaches performance competitive with or exceeding the state of the art on all MultiMedBench tasks, often surpassing specialist models by a wide margin. We also report examples of zero-shot generalization to novel medical concepts and tasks, positive transfer learning across tasks, and emergent zero-shot medical reasoning. To further probe the capabilities and limitations of Med-PaLM M, we conduct a radiologist evaluation of model-generated (and human) chest X-ray reports and observe encouraging performance across model scales. In a side-by-side ranking on 246 retrospective chest X-rays, clinicians express a pairwise preference for Med-PaLM M reports over those produced by radiologists in up to 40.50% of cases, suggesting potential clinical utility. While considerable work is needed to validate these models in real-world use cases, our results represent a milestone towards the development of generalist biomedical AI systems.

汎用型バイオメディカルAIに向けて

Towards Generalist Biomedical AI

要旨

Support