MedVLSynther:基于生成器-验证器大语言模型的医学文档高质量视觉问答合成系统
MedVLSynther: Synthesizing High-Quality Visual Question Answering from Medical Documents with Generator-Verifier LMMs
October 29, 2025
作者: Xiaoke Huang, Ningsen Wang, Hui Liu, Xianfeng Tang, Yuyin Zhou
cs.AI
摘要
大型多模态模型(LMMs)在解答需要结合图像与文本进行联合推理的医学问题方面能力日益增强,但缺乏大规模、可公开使用的高质量语料库阻碍了通用医学视觉问答系统的训练。我们提出MedVLSynther——一个基于规则指导的生成-验证框架,该框架通过关联生物医学公开文献中的图表、标题及文内引用,直接生成高质量的多选题式视觉问答项目。生成器按照机器可校验的JSON规范产出自含式题干及并行互斥的选项;多阶段验证器在接收问题前执行关键审核(自含性、单一正确答案、临床有效性、图文一致性),授予细粒度正向评分,并对常见错误模式进行扣分处理。将该流程应用于PubMed Central数据库后,我们得到MedSynVQA数据集:包含13,087道经过审核的问题,涉及14,803张图像,覆盖13种影像模态和28个解剖区域。使用可验证奖励通过强化学习训练开放权重的LMMs,在六项医学VQA基准测试中准确率全面提升,3B和7B模型分别达到55.85和58.15的平均分,其中VQA-RAD最高达77.57分,PathVQA达67.76分,优于现有主流医学LMMs。消融实验证实生成与验证环节均不可或缺,更多验证数据持续带来性能提升;针对性污染分析未检测到评估集泄露。通过完全基于公开文献和开放权重模型运作,MedVLSynther为可扩展的医学VQA训练数据提供了一条可审计、可复现且保护隐私的技术路径。
English
Large Multimodal Models (LMMs) are increasingly capable of answering medical
questions that require joint reasoning over images and text, yet training
general medical VQA systems is impeded by the lack of large, openly usable,
high-quality corpora. We present MedVLSynther, a rubric-guided
generator-verifier framework that synthesizes high-quality multiple-choice VQA
items directly from open biomedical literature by conditioning on figures,
captions, and in-text references. The generator produces self-contained stems
and parallel, mutually exclusive options under a machine-checkable JSON schema;
a multi-stage verifier enforces essential gates (self-containment, single
correct answer, clinical validity, image-text consistency), awards fine-grained
positive points, and penalizes common failure modes before acceptance. Applying
this pipeline to PubMed Central yields MedSynVQA: 13,087 audited questions over
14,803 images spanning 13 imaging modalities and 28 anatomical regions.
Training open-weight LMMs with reinforcement learning using verifiable rewards
improves accuracy across six medical VQA benchmarks, achieving averages of
55.85 (3B) and 58.15 (7B), with up to 77.57 on VQA-RAD and 67.76 on PathVQA,
outperforming strong medical LMMs. A Ablations verify that both generation and
verification are necessary and that more verified data consistently helps, and
a targeted contamination analysis detects no leakage from evaluation suites. By
operating entirely on open literature and open-weight models, MedVLSynther
offers an auditable, reproducible, and privacy-preserving path to scalable
medical VQA training data.