VLM4Bio:一個用於評估預訓練視覺語言模型在生物影像中特徵發現的基準數據集。
VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images
August 28, 2024
作者: M. Maruf, Arka Daw, Kazi Sajeed Mehrab, Harish Babu Manogaran, Abhilash Neog, Medha Sawhney, Mridul Khurana, James P. Balhoff, Yasin Bakis, Bahadir Altintas, Matthew J. Thompson, Elizabeth G. Campolongo, Josef C. Uyeda, Hilmar Lapp, Henry L. Bart, Paula M. Mabee, Yu Su, Wei-Lun Chao, Charles Stewart, Tanya Berger-Wolf, Wasila Dahdul, Anuj Karpatne
cs.AI
摘要
圖像在記錄地球生物多樣性方面越來越重要,為生物學領域的科學發現提供了新的加速機會,特別是隨著大視覺語言模型(VLMs)的出現。我們探討預訓練的VLMs是否可以幫助科學家回答各種與生物相關的問題,而無需進行任何額外的微調。本文評估了12個最先進的VLMs在生物學領域的效果,使用一個新的數據集VLM4Bio,包含469K個問答對,涉及三組生物圖像:魚類、鳥類和蝴蝶,涵蓋五個與生物相關的任務。我們還探討了應用提示技術和對VLMs性能的推理幻覺測試的影響,為使用圖像回答與生物相關問題方面的當前最先進VLMs的能力帶來新的見解。本文報告的所有分析的代碼和數據集可在https://github.com/sammarfy/VLM4Bio 找到。
English
Images are increasingly becoming the currency for documenting biodiversity on
the planet, providing novel opportunities for accelerating scientific
discoveries in the field of organismal biology, especially with the advent of
large vision-language models (VLMs). We ask if pre-trained VLMs can aid
scientists in answering a range of biologically relevant questions without any
additional fine-tuning. In this paper, we evaluate the effectiveness of 12
state-of-the-art (SOTA) VLMs in the field of organismal biology using a novel
dataset, VLM4Bio, consisting of 469K question-answer pairs involving 30K images
from three groups of organisms: fishes, birds, and butterflies, covering five
biologically relevant tasks. We also explore the effects of applying prompting
techniques and tests for reasoning hallucination on the performance of VLMs,
shedding new light on the capabilities of current SOTA VLMs in answering
biologically relevant questions using images. The code and datasets for running
all the analyses reported in this paper can be found at
https://github.com/sammarfy/VLM4Bio.Summary
AI-Generated Summary