VisText-Mosquito:一個多模態數據集與基準,用於基於AI的蚊蟲孳生地檢測與推理
VisText-Mosquito: A Multimodal Dataset and Benchmark for AI-Based Mosquito Breeding Site Detection and Reasoning
June 17, 2025
作者: Md. Adnanul Islam, Md. Faiyaz Abdullah Sayeedi, Md. Asaduzzaman Shuvo, Muhammad Ziaur Rahman, Shahanur Rahman Bappy, Raiyan Rahman, Swakkhar Shatabda
cs.AI
摘要
蚊媒疾病構成全球重大健康威脅,需要及早發現並主動控制孳生地以預防疫情爆發。本文介紹了VisText-Mosquito,這是一個整合視覺與文本數據的多模態數據集,旨在支持蚊蟲孳生地分析的自動化檢測、分割及推理。該數據集包含1,828張用於目標檢測的註釋圖像、142張用於水面分割的圖像,以及與每張圖像相關的自然語言推理文本。在目標檢測方面,YOLOv9s模型達到了最高的精確度0.92926和mAP@50值0.92891;而YOLOv11n-Seg在分割任務中則取得了0.91587的精確度和0.79795的mAP@50值。對於推理生成,我們微調後的BLIP模型最終損失為0.0028,BLEU得分為54.7,BERTScore為0.91,ROUGE-L為0.87。此數據集與模型框架強調了“預防勝於治療”的主題,展示了基於AI的檢測如何主動應對蚊媒疾病風險。數據集及實現代碼已公開於GitHub:https://github.com/adnanul-islam-jisun/VisText-Mosquito。
English
Mosquito-borne diseases pose a major global health risk, requiring early
detection and proactive control of breeding sites to prevent outbreaks. In this
paper, we present VisText-Mosquito, a multimodal dataset that integrates visual
and textual data to support automated detection, segmentation, and reasoning
for mosquito breeding site analysis. The dataset includes 1,828 annotated
images for object detection, 142 images for water surface segmentation, and
natural language reasoning texts linked to each image. The YOLOv9s model
achieves the highest precision of 0.92926 and mAP@50 of 0.92891 for object
detection, while YOLOv11n-Seg reaches a segmentation precision of 0.91587 and
mAP@50 of 0.79795. For reasoning generation, our fine-tuned BLIP model achieves
a final loss of 0.0028, with a BLEU score of 54.7, BERTScore of 0.91, and
ROUGE-L of 0.87. This dataset and model framework emphasize the theme
"Prevention is Better than Cure", showcasing how AI-based detection can
proactively address mosquito-borne disease risks. The dataset and
implementation code are publicly available at GitHub:
https://github.com/adnanul-islam-jisun/VisText-Mosquito