ChatPaper.aiChatPaper

提升多模态大语言模型中的逐步可验证医疗推理能力

Enhancing Step-by-Step and Verifiable Medical Reasoning in MLLMs

June 20, 2025
作者: Haoran Sun, Yankai Jiang, Wenjie Lou, Yujie Zhang, Wenjie Li, Lilong Wang, Mianxin Liu, Lei Liu, Xiaosong Wang
cs.AI

摘要

多模态大语言模型(MLLMs)已在通用任务上展现出强大的推理能力,然而其在医疗领域的应用仍处于初期阶段。构建思维链(CoT)训练数据对于增强医疗MLLMs的推理能力至关重要。然而,现有方法在提供全面框架以搜索和评估针对关键诊断的有效推理路径方面存在不足。为应对这一挑战,我们提出了导师-实习生协作搜索(MICS),一种新颖的推理路径搜索方案,用于生成严谨且有效的医疗CoT数据。MICS首先利用导师模型逐步初始化推理,随后提示每位实习生模型沿着这些起始路径继续思考,最终根据多位实习生模型的整体推理表现选择最优推理路径。推理表现由MICS评分决定,该评分评估生成推理路径的质量。最终,我们构建了MMRP,一个按难度分级的多任务医疗推理数据集,以及Chiron-o1,一个通过课程学习策略设计的新医疗MLLM,具备强大的视觉问答和泛化推理能力。大量实验证明,使用MICS构建的CoT数据集训练的Chiron-o1,在一系列医疗视觉问答和推理基准测试中达到了最先进的性能。代码可在GitHub - manglu097/Chiron-o1: 增强MLLMs中的逐步与可验证医疗推理获取。
English
Multimodal large language models (MLLMs) have begun to demonstrate robust reasoning capabilities on general tasks, yet their application in the medical domain remains in its early stages. Constructing chain-of-thought (CoT) training data is essential for bolstering the reasoning abilities of medical MLLMs. However, existing approaches exhibit a deficiency in offering a comprehensive framework for searching and evaluating effective reasoning paths towards critical diagnosis. To address this challenge, we propose Mentor-Intern Collaborative Search (MICS), a novel reasoning-path searching scheme to generate rigorous and effective medical CoT data. MICS first leverages mentor models to initialize the reasoning, one step at a time, then prompts each intern model to continue the thinking along those initiated paths, and finally selects the optimal reasoning path according to the overall reasoning performance of multiple intern models. The reasoning performance is determined by an MICS-Score, which assesses the quality of generated reasoning paths. Eventually, we construct MMRP, a multi-task medical reasoning dataset with ranked difficulty, and Chiron-o1, a new medical MLLM devised via a curriculum learning strategy, with robust visual question-answering and generalizable reasoning capabilities. Extensive experiments demonstrate that Chiron-o1, trained on our CoT dataset constructed using MICS, achieves state-of-the-art performance across a list of medical visual question answering and reasoning benchmarks. Codes are available at GitHub - manglu097/Chiron-o1: Enhancing Step-by-Step and Verifiable Medical Reasoning in MLLMs
PDF72June 24, 2025