Kvasir-VQA-x1：面向医学推理与稳健医疗视觉问答的多模态胃肠内窥镜数据集

摘要

医学视觉问答（MedVQA）是开发临床决策支持系统的一个前景广阔的领域，但其进展常受限于现有数据集，这些数据集往往缺乏临床复杂性和视觉多样性。为填补这些空白，我们推出了Kvasir-VQA-x1，一个针对胃肠道（GI）内窥镜检查的新大规模数据集。我们的工作显著扩展了原始Kvasir-VQA，新增了159,549个旨在测试更深层次临床推理的问题-答案对。我们采用了一种系统化方法，利用大型语言模型生成这些问题，并按复杂性分层，以更好地评估模型的推理能力。为确保我们的数据集能帮助模型适应真实世界的临床场景，我们还引入了多种模拟常见成像伪影的视觉增强技术。该数据集结构支持两大评估轨道：一是标准VQA性能评估，二是测试模型对这些视觉扰动的鲁棒性。通过提供一个更具挑战性和临床相关性的基准，Kvasir-VQA-x1旨在加速开发更可靠、有效的多模态AI系统，应用于临床环境。该数据集完全开放，遵循FAIR数据原则，成为广大研究社区的宝贵资源。代码与数据访问：https://github.com/Simula/Kvasir-VQA-x1 和 https://huggingface.co/datasets/SimulaMet/Kvasir-VQA-x1。

English

Medical Visual Question Answering (MedVQA) is a promising field for developing clinical decision support systems, yet progress is often limited by the available datasets, which can lack clinical complexity and visual diversity. To address these gaps, we introduce Kvasir-VQA-x1, a new, large-scale dataset for gastrointestinal (GI) endoscopy. Our work significantly expands upon the original Kvasir-VQA by incorporating 159,549 new question-answer pairs that are designed to test deeper clinical reasoning. We developed a systematic method using large language models to generate these questions, which are stratified by complexity to better assess a model's inference capabilities. To ensure our dataset prepares models for real-world clinical scenarios, we have also introduced a variety of visual augmentations that mimic common imaging artifacts. The dataset is structured to support two main evaluation tracks: one for standard VQA performance and another to test model robustness against these visual perturbations. By providing a more challenging and clinically relevant benchmark, Kvasir-VQA-x1 aims to accelerate the development of more reliable and effective multimodal AI systems for use in clinical settings. The dataset is fully accessible and adheres to FAIR data principles, making it a valuable resource for the wider research community. Code and data: https://github.com/Simula/Kvasir-VQA-x1 and https://huggingface.co/datasets/SimulaMet/Kvasir-VQA-x1

Kvasir-VQA-x1：面向医学推理与稳健医疗视觉问答的多模态胃肠内窥镜数据集

Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy

摘要

Support