VisText-Mosquito：面向AI驱动的蚊虫孳生地检测与推理的多模态数据集与基准平台

摘要

蚊媒疾病构成全球重大健康威胁，亟需早期发现并主动控制孳生地以预防疫情爆发。本文提出VisText-Mosquito，一个融合视觉与文本数据的多模态数据集，旨在支持蚊虫孳生地分析的自动化检测、分割及推理。该数据集包含1,828张用于目标检测的标注图像、142张用于水面分割的图像，以及与每张图像关联的自然语言推理文本。在目标检测任务中，YOLOv9s模型取得了最高精度0.92926和mAP@50达0.92891；而YOLOv11n-Seg在分割任务中实现了0.91587的精度和0.79795的mAP@50。对于推理生成，我们微调后的BLIP模型最终损失为0.0028，BLEU得分54.7，BERTScore 0.91，ROUGE-L 0.87。此数据集与模型框架凸显了“预防胜于治疗”的主题，展示了基于AI的检测如何主动应对蚊媒疾病风险。数据集及实现代码已公开于GitHub：https://github.com/adnanul-islam-jisun/VisText-Mosquito。

English

Mosquito-borne diseases pose a major global health risk, requiring early detection and proactive control of breeding sites to prevent outbreaks. In this paper, we present VisText-Mosquito, a multimodal dataset that integrates visual and textual data to support automated detection, segmentation, and reasoning for mosquito breeding site analysis. The dataset includes 1,828 annotated images for object detection, 142 images for water surface segmentation, and natural language reasoning texts linked to each image. The YOLOv9s model achieves the highest precision of 0.92926 and mAP@50 of 0.92891 for object detection, while YOLOv11n-Seg reaches a segmentation precision of 0.91587 and mAP@50 of 0.79795. For reasoning generation, our fine-tuned BLIP model achieves a final loss of 0.0028, with a BLEU score of 54.7, BERTScore of 0.91, and ROUGE-L of 0.87. This dataset and model framework emphasize the theme "Prevention is Better than Cure", showcasing how AI-based detection can proactively address mosquito-borne disease risks. The dataset and implementation code are publicly available at GitHub: https://github.com/adnanul-islam-jisun/VisText-Mosquito

VisText-Mosquito：面向AI驱动的蚊虫孳生地检测与推理的多模态数据集与基准平台

VisText-Mosquito: A Multimodal Dataset and Benchmark for AI-Based Mosquito Breeding Site Detection and Reasoning

摘要

Support