ChatPaper.aiChatPaper

μ-Bench:一个用于显微镜理解的视觉-语言基准测试。

μ-Bench: A Vision-Language Benchmark for Microscopy Understanding

July 1, 2024
作者: Alejandro Lozano, Jeffrey Nirschl, James Burgess, Sanket Rajan Gupte, Yuhui Zhang, Alyssa Unell, Serena Yeung-Levy
cs.AI

摘要

最近显微镜技术的进步使得细胞生物学和生物医学研究中能够快速产生几太字节的图像数据。视觉-语言模型(VLMs)为大规模生物图像分析提供了一种有前途的解决方案,提高了研究人员的效率,识别新的图像生物标志物,并加速假设生成和科学发现。然而,在生物图像理解中缺乏标准化、多样化和大规模的视觉-语言基准,以评估VLMs在感知和认知能力方面的表现。为了填补这一空白,我们介绍了{\mu}-Bench,这是一个由专家策划的基准,涵盖了生物医学领域的22个任务,涉及各种科学学科(生物学、病理学)、显微镜模式(电子、荧光、光学)、尺度(亚细胞、细胞、组织)以及正常和异常状态下的生物体。我们在{\mu}-Bench上评估了最先进的生物医学、病理学和通用VLMs,并发现:i)当前模型在所有类别上都存在困难,即使是基本任务,如区分显微镜模式;ii)在生物医学数据上进行微调的当前专家模型通常表现不如通用模型;iii)在特定显微镜领域进行微调可能导致灾难性遗忘,侵蚀其基础模型中编码的先前生物医学知识。iv)在微调和预训练模型之间进行权重插值提供了一种解决遗忘问题的方法,并改善了在生物医学任务中的总体性能。我们以一种宽松的许可证发布{\mu}-Bench,以加速显微镜基础模型的研究和开发。
English
Recent advances in microscopy have enabled the rapid generation of terabytes of image data in cell biology and biomedical research. Vision-language models (VLMs) offer a promising solution for large-scale biological image analysis, enhancing researchers' efficiency, identifying new image biomarkers, and accelerating hypothesis generation and scientific discovery. However, there is a lack of standardized, diverse, and large-scale vision-language benchmarks to evaluate VLMs' perception and cognition capabilities in biological image understanding. To address this gap, we introduce {\mu}-Bench, an expert-curated benchmark encompassing 22 biomedical tasks across various scientific disciplines (biology, pathology), microscopy modalities (electron, fluorescence, light), scales (subcellular, cellular, tissue), and organisms in both normal and abnormal states. We evaluate state-of-the-art biomedical, pathology, and general VLMs on {\mu}-Bench and find that: i) current models struggle on all categories, even for basic tasks such as distinguishing microscopy modalities; ii) current specialist models fine-tuned on biomedical data often perform worse than generalist models; iii) fine-tuning in specific microscopy domains can cause catastrophic forgetting, eroding prior biomedical knowledge encoded in their base model. iv) weight interpolation between fine-tuned and pre-trained models offers one solution to forgetting and improves general performance across biomedical tasks. We release {\mu}-Bench under a permissive license to accelerate the research and development of microscopy foundation models.

Summary

AI-Generated Summary

PDF71November 28, 2024