μ-Bench:一個針對顯微鏡理解的視覺語言基準。
μ-Bench: A Vision-Language Benchmark for Microscopy Understanding
July 1, 2024
作者: Alejandro Lozano, Jeffrey Nirschl, James Burgess, Sanket Rajan Gupte, Yuhui Zhang, Alyssa Unell, Serena Yeung-Levy
cs.AI
摘要
近年來顯微鏡技術的進步使得細胞生物學和生物醫學研究中能夠快速生成數兆位元組的影像數據。視覺語言模型(VLMs)為大規模生物影像分析提供了一個有前途的解決方案,提升了研究人員的效率,識別新的影像生物標誌,加速假設生成和科學發現。然而,在生物影像理解中,目前缺乏標準化、多樣化和大規模的視覺語言基準來評估VLMs的感知和認知能力。為了彌補這一差距,我們引入了{\mu}-Bench,這是一個由專家精心策劃的基準,涵蓋了來自各種科學學科(生物學、病理學)、顯微鏡模式(電子、螢光、光學)、尺度(亞細胞、細胞、組織)以及正常和異常狀態下的22個生物醫學任務。我們在{\mu}-Bench上評估了最先進的生物醫學、病理學和通用VLMs,發現:i)目前的模型在所有類別上都存在困難,即使是基本任務,如區分顯微鏡模式;ii)在生物醫學數據上進行細化調整的當前專家模型通常表現比通用模型更差;iii)在特定類型的顯微鏡領域進行細化調整可能導致災難性遺忘,侵蝕了其基本模型中編碼的先前生物醫學知識。iv)在細化調整和預訓練模型之間進行權重插值提供了一種解決遺忘的方法,並提高了在生物醫學任務中的通用性能。我們以一個寬鬆的許可證釋出{\mu}-Bench,以加速顯微鏡基礎模型的研究和開發。
English
Recent advances in microscopy have enabled the rapid generation of terabytes
of image data in cell biology and biomedical research. Vision-language models
(VLMs) offer a promising solution for large-scale biological image analysis,
enhancing researchers' efficiency, identifying new image biomarkers, and
accelerating hypothesis generation and scientific discovery. However, there is
a lack of standardized, diverse, and large-scale vision-language benchmarks to
evaluate VLMs' perception and cognition capabilities in biological image
understanding. To address this gap, we introduce {\mu}-Bench, an expert-curated
benchmark encompassing 22 biomedical tasks across various scientific
disciplines (biology, pathology), microscopy modalities (electron,
fluorescence, light), scales (subcellular, cellular, tissue), and organisms in
both normal and abnormal states. We evaluate state-of-the-art biomedical,
pathology, and general VLMs on {\mu}-Bench and find that: i) current models
struggle on all categories, even for basic tasks such as distinguishing
microscopy modalities; ii) current specialist models fine-tuned on biomedical
data often perform worse than generalist models; iii) fine-tuning in specific
microscopy domains can cause catastrophic forgetting, eroding prior biomedical
knowledge encoded in their base model. iv) weight interpolation between
fine-tuned and pre-trained models offers one solution to forgetting and
improves general performance across biomedical tasks. We release {\mu}-Bench
under a permissive license to accelerate the research and development of
microscopy foundation models.Summary
AI-Generated Summary