Med-Flamingo：一种多模态医学小样本学习器

摘要

医学本质上是一个多方面的领域，需要综合各种模态的信息。医学生成式视觉语言模型（VLMs）迈出了朝着这个方向迈出的第一步，并承诺许多令人兴奋的临床应用。然而，现有模型通常需要在庞大的下游数据集上进行微调，这构成了一个重要限制，因为在许多医学应用中，数据稀缺，需要能够从少量实例中实时学习的模型。在这里，我们提出了Med-Flamingo，这是一种适用于医学领域的多模态少样本学习器。基于OpenFlamingo-9B，我们继续在医学图像文本数据（来自出版物和教科书）上进行配对和交织的预训练。Med-Flamingo解锁了少样本生成式医学视觉问答（VQA）能力，我们在包括一个新颖的具有挑战性的开放式VQA数据集（包含视觉USMLE风格问题）在内的多个数据集上进行评估。此外，我们进行了首次针对生成式医学VQA的人类评估，医生们在交互式应用程序中审查问题和盲目生成。Med-Flamingo在医学VQA的生成性能中提高了高达20\%的临床评分，并首次实现了多模态医学少样本适应，如理由生成。我们在https://github.com/snap-stanford/med-flamingo上发布了我们的模型、代码和评估应用程序。

English

Medicine, by its nature, is a multifaceted domain that requires the synthesis of information across various modalities. Medical generative vision-language models (VLMs) make a first step in this direction and promise many exciting clinical applications. However, existing models typically have to be fine-tuned on sizeable down-stream datasets, which poses a significant limitation as in many medical applications data is scarce, necessitating models that are capable of learning from few examples in real-time. Here we propose Med-Flamingo, a multimodal few-shot learner adapted to the medical domain. Based on OpenFlamingo-9B, we continue pre-training on paired and interleaved medical image-text data from publications and textbooks. Med-Flamingo unlocks few-shot generative medical visual question answering (VQA) abilities, which we evaluate on several datasets including a novel challenging open-ended VQA dataset of visual USMLE-style problems. Furthermore, we conduct the first human evaluation for generative medical VQA where physicians review the problems and blinded generations in an interactive app. Med-Flamingo improves performance in generative medical VQA by up to 20\% in clinician's rating and firstly enables multimodal medical few-shot adaptations, such as rationale generation. We release our model, code, and evaluation app under https://github.com/snap-stanford/med-flamingo.

Med-Flamingo：一种多模态医学小样本学习器

Med-Flamingo: a Multimodal Medical Few-shot Learner

摘要

Support