VLM4Bio:用于评估预训练视觉语言模型在生物图像特征发现中的基准数据集
VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images
August 28, 2024
作者: M. Maruf, Arka Daw, Kazi Sajeed Mehrab, Harish Babu Manogaran, Abhilash Neog, Medha Sawhney, Mridul Khurana, James P. Balhoff, Yasin Bakis, Bahadir Altintas, Matthew J. Thompson, Elizabeth G. Campolongo, Josef C. Uyeda, Hilmar Lapp, Henry L. Bart, Paula M. Mabee, Yu Su, Wei-Lun Chao, Charles Stewart, Tanya Berger-Wolf, Wasila Dahdul, Anuj Karpatne
cs.AI
摘要
图像越来越成为记录地球生物多样性的一种方式,为加快生物学领域科学发现提供了新机遇,尤其是随着大规模视觉-语言模型(VLMs)的出现。我们探讨预训练的VLMs是否可以帮助科学家回答一系列与生物相关的问题,而无需额外的微调。本文评估了12种最先进的VLMs在生物学领域的有效性,使用了一个新颖的数据集VLM4Bio,包含30K张图像,涉及三组生物:鱼类、鸟类和蝴蝶,涵盖五个与生物相关的任务,共469K个问题-答案对。我们还探讨了应用提示技术的效果以及对VLMs性能的推理幻觉测试,为了揭示当前最先进的VLMs在使用图像回答与生物相关的问题方面的能力,带来新的见解。本文报告的所有分析的代码和数据集可在https://github.com/sammarfy/VLM4Bio 找到。
English
Images are increasingly becoming the currency for documenting biodiversity on
the planet, providing novel opportunities for accelerating scientific
discoveries in the field of organismal biology, especially with the advent of
large vision-language models (VLMs). We ask if pre-trained VLMs can aid
scientists in answering a range of biologically relevant questions without any
additional fine-tuning. In this paper, we evaluate the effectiveness of 12
state-of-the-art (SOTA) VLMs in the field of organismal biology using a novel
dataset, VLM4Bio, consisting of 469K question-answer pairs involving 30K images
from three groups of organisms: fishes, birds, and butterflies, covering five
biologically relevant tasks. We also explore the effects of applying prompting
techniques and tests for reasoning hallucination on the performance of VLMs,
shedding new light on the capabilities of current SOTA VLMs in answering
biologically relevant questions using images. The code and datasets for running
all the analyses reported in this paper can be found at
https://github.com/sammarfy/VLM4Bio.Summary
AI-Generated Summary