Kvasir-VQA:一個文本-圖像配對的胃腸道資料集
Kvasir-VQA: A Text-Image Pair GI Tract Dataset
September 2, 2024
作者: Sushant Gautam, Andrea Storås, Cise Midoglu, Steven A. Hicks, Vajira Thambawita, Pål Halvorsen, Michael A. Riegler
cs.AI
摘要
我們介紹了Kvasir-VQA,這是從HyperKvasir和Kvasir-Instrument數據集延伸而來的擴展數據集,增加了問答標註,以促進在胃腸道(GI)診斷中進行高級機器學習任務。該數據集包含6,500張帶有標註的圖像,涵蓋各種GI道路狀況和外科器械,支持多種問題類型,包括是/否、選擇、位置和數量計算。該數據集旨在應用於圖像說明、視覺問答(VQA)、基於文本生成合成醫學圖像、對象檢測和分類等領域。我們的實驗表明了該數據集在訓練三個選定任務的模型方面的有效性,展示了在醫學圖像分析和診斷中的重要應用。我們還為每個任務提供了評估指標,突出了我們數據集的可用性和多功能性。該數據集及相關資料可在https://datasets.simula.no/kvasir-vqa獲得。
English
We introduce Kvasir-VQA, an extended dataset derived from the HyperKvasir and
Kvasir-Instrument datasets, augmented with question-and-answer annotations to
facilitate advanced machine learning tasks in Gastrointestinal (GI)
diagnostics. This dataset comprises 6,500 annotated images spanning various GI
tract conditions and surgical instruments, and it supports multiple question
types including yes/no, choice, location, and numerical count. The dataset is
intended for applications such as image captioning, Visual Question Answering
(VQA), text-based generation of synthetic medical images, object detection, and
classification. Our experiments demonstrate the dataset's effectiveness in
training models for three selected tasks, showcasing significant applications
in medical image analysis and diagnostics. We also present evaluation metrics
for each task, highlighting the usability and versatility of our dataset. The
dataset and supporting artifacts are available at
https://datasets.simula.no/kvasir-vqa.Summary
AI-Generated Summary