ChatPaper.aiChatPaper

Gemini模型在医学中的能力

Capabilities of Gemini Models in Medicine

April 29, 2024
作者: Khaled Saab, Tao Tu, Wei-Hung Weng, Ryutaro Tanno, David Stutz, Ellery Wulczyn, Fan Zhang, Tim Strother, Chunjong Park, Elahe Vedadi, Juanma Zambrano Chaves, Szu-Yeu Hu, Mike Schaekermann, Aishwarya Kamath, Yong Cheng, David G. T. Barrett, Cathy Cheung, Basil Mustafa, Anil Palepu, Daniel McDuff, Le Hou, Tomer Golany, Luyang Liu, Jean-baptiste Alayrac, Neil Houlsby, Nenad Tomasev, Jan Freyberg, Charles Lau, Jonas Kemp, Jeremy Lai, Shekoofeh Azizi, Kimberly Kanada, SiWai Man, Kavita Kulkarni, Ruoxi Sun, Siamak Shakeri, Luheng He, Ben Caine, Albert Webson, Natasha Latysheva, Melvin Johnson, Philip Mansfield, Jian Lu, Ehud Rivlin, Jesper Anderson, Bradley Green, Renee Wong, Jonathan Krause, Jonathon Shlens, Ewa Dominowska, S. M. Ali Eslami, Claire Cui, Oriol Vinyals, Koray Kavukcuoglu, James Manyika, Jeff Dean, Demis Hassabis, Yossi Matias, Dale Webster, Joelle Barral, Greg Corrado, Christopher Semturs, S. Sara Mahdavi, Juraj Gottweis, Alan Karthikesalingam, Vivek Natarajan
cs.AI

摘要

在各种医疗应用中取得卓越成就对人工智能提出了相当大的挑战,需要先进的推理能力、获取最新的医学知识以及理解复杂的多模态数据。Gemini模型在多模态和长上下文推理方面具有强大的通用能力,在医学领域提供了令人兴奋的可能性。基于Gemini的这些核心优势,我们引入了Med-Gemini,这是一系列在医学领域专门针对性强的多模态模型,能够无缝地利用网络搜索,并且可以通过自定义编码器高效地适应新的模态。我们在14个医学基准测试上评估了Med-Gemini,在其中有10个基准测试上取得了新的最先进性能,并且在每个可以进行直接比较的基准测试上都超过了GPT-4模型系列,通常领先很大。在流行的MedQA(USMLE)基准测试中,我们表现最佳的Med-Gemini模型实现了91.1%的准确率,采用了一种新颖的基于不确定性引导的搜索策略。在包括NEJM图像挑战和MMMU(健康与医学)在内的7个多模态基准测试中,Med-Gemini相对于GPT-4V平均提升了44.5%。我们通过在长匿名健康记录和医学视频问答中的针对长上下文能力的最先进表现,展示了Med-Gemini的有效性,超越了先前仅使用上下文学习的定制方法。最后,Med-Gemini的表现表明在医学文本摘要等任务上超越了人类专家,同时还展示了在多模态医学对话、医学研究和教育方面的潜在前景。综合起来,我们的结果为Med-Gemini的潜力提供了令人信服的证据,尽管在这个安全关键领域进行实际部署之前,进一步严格评估将至关重要。
English
Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-Gemini, a family of highly capable multimodal models that are specialized in medicine with the ability to seamlessly use web search, and that can be efficiently tailored to novel modalities using custom encoders. We evaluate Med-Gemini on 14 medical benchmarks, establishing new state-of-the-art (SoTA) performance on 10 of them, and surpass the GPT-4 model family on every benchmark where a direct comparison is viable, often by a wide margin. On the popular MedQA (USMLE) benchmark, our best-performing Med-Gemini model achieves SoTA performance of 91.1% accuracy, using a novel uncertainty-guided search strategy. On 7 multimodal benchmarks including NEJM Image Challenges and MMMU (health & medicine), Med-Gemini improves over GPT-4V by an average relative margin of 44.5%. We demonstrate the effectiveness of Med-Gemini's long-context capabilities through SoTA performance on a needle-in-a-haystack retrieval task from long de-identified health records and medical video question answering, surpassing prior bespoke methods using only in-context learning. Finally, Med-Gemini's performance suggests real-world utility by surpassing human experts on tasks such as medical text summarization, alongside demonstrations of promising potential for multimodal medical dialogue, medical research and education. Taken together, our results offer compelling evidence for Med-Gemini's potential, although further rigorous evaluation will be crucial before real-world deployment in this safety-critical domain.

Summary

AI-Generated Summary

PDF253December 15, 2024