ChatPaper.aiChatPaper

MicroVQA++:面向多模态大语言模型的高质量显微图像推理数据集与弱监督图谱构建

MicroVQA++: High-Quality Microscopy Reasoning Dataset with Weakly Supervised Graphs for Multimodal Large Language Model

November 14, 2025
作者: Manyu Li, Ruian He, Chenxi Ma, Weimin Tan, Bo Yan
cs.AI

摘要

尽管多模态大语言模型在生物医学影像领域的应用日益广泛,但显微成像领域的科学推理能力仍受限于大规模高质量训练数据的稀缺。我们推出MicroVQA++——一个基于BIOMEDICA档案构建的三阶段、大规模高质量显微视觉问答语料库。第一阶段通过同行评议期刊中经专家验证的图注对实现监督引导;第二阶段应用HiCQA-Graph(一种融合图像、图注和问答的异质图网络),结合基于自然语言推理的文本蕴含分析、基于CLIP的视觉语言对齐以及智能体信号,实现不一致样本的识别与过滤;第三阶段采用多模态大语言模型智能体生成多选题,并经过人工筛查。最终发布的数据集包含大规模训练集和经人工校验的测试集,其布鲁姆分类学难度样本分布优于MicroVQA基准。本研究贡献包括:(i)通过图网络过滤与人工精校相结合的质量控制数据集;(ii)首个联合建模(图像、图注、问答)三元组以实现跨模态一致性过滤的HiCQA-Graph;(iii)证明精细数据构建能使40亿参数级多模态大语言模型达到媲美GPT-5的显微推理性能,并在开源模型中实现最优效果。代码与数据集将在评审结束后公开。
English
Multimodal Large Language Models are increasingly applied to biomedical imaging, yet scientific reasoning for microscopy remains limited by the scarcity of large-scale, high-quality training data. We introduce MicroVQA++, a three-stage, large-scale and high-quality microscopy VQA corpus derived from the BIOMEDICA archive. Stage one bootstraps supervision from expert-validated figure-caption pairs sourced from peer-reviewed articles. Stage two applies HiCQA-Graph, a novel heterogeneous graph over images, captions, and QAs that fuses NLI-based textual entailment, CLIP-based vision-language alignment, and agent signals to identify and filter inconsistent samples. Stage three uses a MultiModal Large Language Model (MLLM) agent to generate multiple-choice questions (MCQ) followed by human screening. The resulting release comprises a large training split and a human-checked test split whose Bloom's level hard-sample distribution exceeds the MicroVQA benchmark. Our work delivers (i) a quality-controlled dataset that couples expert literature with graph-based filtering and human refinement; (ii) HiCQA-Graph, the first graph that jointly models (image, caption, QA) for cross-modal consistency filtering; (iii) evidence that careful data construction enables 4B-scale MLLMs to reach competitive microscopy reasoning performance (e.g., GPT-5) and achieve state-of-the-art performance among open-source MLLMs. Code and dataset will be released after the review process concludes.
PDF42December 1, 2025