Med-Flamingo: 멀티모달 의료 소샷 학습 모델

초록

의학은 본질적으로 다양한 양식의 정보를 종합해야 하는 다면적인 영역입니다. 의료 생성형 시각-언어 모델(VLMs)은 이러한 방향으로의 첫걸음을 내딛으며 많은 흥미로운 임상 응용 가능성을 약속합니다. 그러나 기존 모델들은 일반적으로 상당한 규모의 하위 데이터셋에 대해 미세 조정을 해야 하는데, 이는 많은 의료 응용 분야에서 데이터가 부족한 상황에서 상당한 제약으로 작용합니다. 따라서 실시간으로 소수의 예제만으로 학습할 수 있는 모델이 필요합니다. 본 연구에서는 의료 영역에 적합한 멀티모달 소샷 학습자 Med-Flamingo를 제안합니다. OpenFlamingo-9B를 기반으로, 출판물과 교과서에서 수집한 짝지어진 및 교차된 의료 이미지-텍스트 데이터에 대해 사전 학습을 계속합니다. Med-Flamingo는 소샷 생성형 의료 시각 질의응답(VQA) 능력을 발휘하며, 이를 시각적 USMLE 스타일 문제로 구성된 새로운 도전적인 개방형 VQA 데이터셋을 포함한 여러 데이터셋에서 평가합니다. 또한, 생성형 의료 VQA에 대한 최초의 인간 평가를 진행하여 의사들이 문제와 블라인드된 생성 결과를 인터랙티브 앱에서 검토합니다. Med-Flamingo는 임상의 평가에서 생성형 의료 VQA 성능을 최대 20% 향상시키며, 근거 생성과 같은 멀티모달 의료 소샷 적응을 처음으로 가능하게 합니다. 우리는 모델, 코드, 평가 앱을 https://github.com/snap-stanford/med-flamingo에서 공개합니다.

English

Medicine, by its nature, is a multifaceted domain that requires the synthesis of information across various modalities. Medical generative vision-language models (VLMs) make a first step in this direction and promise many exciting clinical applications. However, existing models typically have to be fine-tuned on sizeable down-stream datasets, which poses a significant limitation as in many medical applications data is scarce, necessitating models that are capable of learning from few examples in real-time. Here we propose Med-Flamingo, a multimodal few-shot learner adapted to the medical domain. Based on OpenFlamingo-9B, we continue pre-training on paired and interleaved medical image-text data from publications and textbooks. Med-Flamingo unlocks few-shot generative medical visual question answering (VQA) abilities, which we evaluate on several datasets including a novel challenging open-ended VQA dataset of visual USMLE-style problems. Furthermore, we conduct the first human evaluation for generative medical VQA where physicians review the problems and blinded generations in an interactive app. Med-Flamingo improves performance in generative medical VQA by up to 20\% in clinician's rating and firstly enables multimodal medical few-shot adaptations, such as rationale generation. We release our model, code, and evaluation app under https://github.com/snap-stanford/med-flamingo.

Med-Flamingo: 멀티모달 의료 소샷 학습 모델

Med-Flamingo: a Multimodal Medical Few-shot Learner

초록

Support