ChatPaper.aiChatPaper

WorldMedQA-V:一个多语言、多模态的医学检查数据集,用于多模态语言模型评估。

WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation

October 16, 2024
作者: João Matos, Shan Chen, Siena Placino, Yingya Li, Juan Carlos Climent Pardo, Daphna Idan, Takeshi Tohyama, David Restrepo, Luis F. Nakayama, Jose M. M. Pascual-Leone, Guergana Savova, Hugo Aerts, Leo A. Celi, A. Ian Wong, Danielle S. Bitterman, Jack Gallifant
cs.AI

摘要

多模态/视觉语言模型(VLMs)越来越多地在全球医疗环境中部署,这需要健壮的基准来确保其安全性、有效性和公平性。源自国家医学考试的多项选择问答(QA)数据集长期以来一直作为有价值的评估工具,但现有数据集主要仅限于文本,并且仅支持有限的语言和国家。为了解决这些挑战,我们提出了WorldMedQA-V,这是一个更新的多语言、多模态基准数据集,旨在评估医疗领域的VLMs。WorldMedQA-V包括568个带有568张医学图像的标记多项选择QA,这些图像来自巴西、以色列、日本和西班牙四个国家,分别涵盖了原始语言和由本地临床医生验证的英文翻译。提供了常见开源和闭源模型的基准性能,分别以当地语言和英文翻译呈现,以及模型是否提供图像。WorldMedQA-V基准旨在更好地将AI系统与其部署的多样化医疗环境相匹配,促进更公平、有效和具有代表性的应用。
English
Multimodal/vision language models (VLMs) are increasingly being deployed in healthcare settings worldwide, necessitating robust benchmarks to ensure their safety, efficacy, and fairness. Multiple-choice question and answer (QA) datasets derived from national medical examinations have long served as valuable evaluation tools, but existing datasets are largely text-only and available in a limited subset of languages and countries. To address these challenges, we present WorldMedQA-V, an updated multilingual, multimodal benchmarking dataset designed to evaluate VLMs in healthcare. WorldMedQA-V includes 568 labeled multiple-choice QAs paired with 568 medical images from four countries (Brazil, Israel, Japan, and Spain), covering original languages and validated English translations by native clinicians, respectively. Baseline performance for common open- and closed-source models are provided in the local language and English translations, and with and without images provided to the model. The WorldMedQA-V benchmark aims to better match AI systems to the diverse healthcare environments in which they are deployed, fostering more equitable, effective, and representative applications.

Summary

AI-Generated Summary

PDF52November 16, 2024