Kvasir-VQA-x1: 消化器内視鏡における医療推論と堅牢なMedVQAのためのマルチモーダルデータセット

要旨

医療視覚質問応答（MedVQA）は、臨床意思決定支援システムの開発において有望な分野であるが、その進展は利用可能なデータセットの制約によってしばしば限られている。これらのデータセットは、臨床的な複雑さや視覚的多様性に欠ける場合がある。これらのギャップを埋めるため、我々は消化器内視鏡（GI）のための新たな大規模データセットであるKvasir-VQA-x1を導入する。本研究は、元のKvasir-VQAを大幅に拡張し、より深い臨床推論をテストするために設計された159,549の新たな質問-回答ペアを組み込んでいる。これらの質問を生成するために、大規模言語モデルを用いた体系的な手法を開発し、モデルの推論能力をより適切に評価するために複雑度に基づいて層別化した。また、現実の臨床シナリオに備えるために、一般的な画像アーティファクトを模倣する多様な視覚的拡張を導入した。このデータセットは、標準的なVQA性能を評価するためのトラックと、これらの視覚的摂動に対するモデルの堅牢性をテストするためのトラックの2つの主要な評価トラックをサポートするように構成されている。より挑戦的で臨床的に関連性の高いベンチマークを提供することにより、Kvasir-VQA-x1は、臨床現場で使用されるより信頼性の高い多モーダルAIシステムの開発を加速することを目指している。このデータセットは完全にアクセス可能であり、FAIRデータ原則に準拠しており、広範な研究コミュニティにとって貴重なリソースとなっている。コードとデータは以下のリンクから入手可能である：https://github.com/Simula/Kvasir-VQA-x1 および https://huggingface.co/datasets/SimulaMet/Kvasir-VQA-x1

English

Medical Visual Question Answering (MedVQA) is a promising field for developing clinical decision support systems, yet progress is often limited by the available datasets, which can lack clinical complexity and visual diversity. To address these gaps, we introduce Kvasir-VQA-x1, a new, large-scale dataset for gastrointestinal (GI) endoscopy. Our work significantly expands upon the original Kvasir-VQA by incorporating 159,549 new question-answer pairs that are designed to test deeper clinical reasoning. We developed a systematic method using large language models to generate these questions, which are stratified by complexity to better assess a model's inference capabilities. To ensure our dataset prepares models for real-world clinical scenarios, we have also introduced a variety of visual augmentations that mimic common imaging artifacts. The dataset is structured to support two main evaluation tracks: one for standard VQA performance and another to test model robustness against these visual perturbations. By providing a more challenging and clinically relevant benchmark, Kvasir-VQA-x1 aims to accelerate the development of more reliable and effective multimodal AI systems for use in clinical settings. The dataset is fully accessible and adheres to FAIR data principles, making it a valuable resource for the wider research community. Code and data: https://github.com/Simula/Kvasir-VQA-x1 and https://huggingface.co/datasets/SimulaMet/Kvasir-VQA-x1

Kvasir-VQA-x1: 消化器内視鏡における医療推論と堅牢なMedVQAのためのマルチモーダルデータセット

Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy

要旨

Support