ChatPaper.aiChatPaper

MedXIAOHE:构建医学多模态大语言模型的综合方案

MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs

February 13, 2026
作者: Baorong Shi, Bo Cui, Boyuan Jiang, Deli Yu, Fang Qian, Haihua Yang, Huichao Wang, Jiale Chen, Jianfei Pan, Jieqiong Cao, Jinghao Lin, Kai Wu, Lin Yang, Shengsheng Yao, Tao Chen, Xiaojun Xiao, Xiaozhong Ji, Xu Wang, Yijun He, Zhixiong Yang
cs.AI

摘要

我们推出MedXIAOHE医疗视觉语言基础模型,旨在推进真实临床场景下的通用医疗理解与推理能力。该模型在多样化医疗基准测试中实现最先进性能,并在多项核心能力上超越主流闭源多模态系统。为实现这一目标,我们提出实体感知的持续预训练框架,通过系统化组织异构医疗语料库来拓宽知识覆盖范围、缩小长尾差距(如罕见病症)。针对专业级医疗推理与交互需求,MedXIAOHE通过强化学习与工具增强的智能体训练融合多元医疗推理模式,支持具有可验证决策轨迹的多步骤诊断推理。为提升真实场景可靠性,模型整合用户偏好评估标准、证据链推理机制及低幻觉长文本报告生成能力,显著增强对医疗指令的遵循度。本技术报告旨在系统记录我们的实践性设计选择、规模化洞察及评估框架,以期推动相关领域研究进展。
English
We present MedXIAOHE, a medical vision-language foundation model designed to advance general-purpose medical understanding and reasoning in real-world clinical applications. MedXIAOHE achieves state-of-the-art performance across diverse medical benchmarks and surpasses leading closed-source multimodal systems on multiple capabilities. To achieve this, we propose an entity-aware continual pretraining framework that organizes heterogeneous medical corpora to broaden knowledge coverage and reduce long-tail gaps (e.g., rare diseases). For medical expert-level reasoning and interaction, MedXIAOHE incorporates diverse medical reasoning patterns via reinforcement learning and tool-augmented agentic training, enabling multi-step diagnostic reasoning with verifiable decision traces. To improve reliability in real-world use, MedXIAOHE integrates user-preference rubrics, evidence-grounded reasoning, and low-hallucination long-form report generation, with improved adherence to medical instructions. We release this report to document our practical design choices, scaling insights, and evaluation framework, hoping to inspire further research.
PDF565February 17, 2026