ChatPaper.aiChatPaper

MedXIAOHE:構建醫學多模態大語言模型的完整方法論

MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs

February 13, 2026
作者: Baorong Shi, Bo Cui, Boyuan Jiang, Deli Yu, Fang Qian, Haihua Yang, Huichao Wang, Jiale Chen, Jianfei Pan, Jieqiong Cao, Jinghao Lin, Kai Wu, Lin Yang, Shengsheng Yao, Tao Chen, Xiaojun Xiao, Xiaozhong Ji, Xu Wang, Yijun He, Zhixiong Yang
cs.AI

摘要

我們推出MedXIAOHE醫療視覺語言基礎模型,旨在提升真實臨床應用中的通用醫療理解與推理能力。該模型在多元醫療基準測試中實現頂尖性能,並在多項核心能力上超越領先的閉源多模態系統。為實現這一目標,我們提出實體感知的持續預訓練框架,通過系統化組織異構醫療語料來擴展知識覆蓋範圍並縮小長尾差距(如罕見疾病)。為實現專家級醫療推理與交互,MedXIAOHE融合強化學習與工具增強型智能體訓練機制,引入多樣化醫療推理模式,支持具可驗證決策軌跡的多步驟診斷推理。為提升真實場景下的可靠性,模型整合用戶偏好評估標準、證據驅動的推理機制及低幻覺長篇報告生成功能,顯著增強對醫療指令的遵循度。本報告旨在系統記錄我們的實踐性設計選擇、規模化洞察與評估框架,以期推動相關領域的深入研究。
English
We present MedXIAOHE, a medical vision-language foundation model designed to advance general-purpose medical understanding and reasoning in real-world clinical applications. MedXIAOHE achieves state-of-the-art performance across diverse medical benchmarks and surpasses leading closed-source multimodal systems on multiple capabilities. To achieve this, we propose an entity-aware continual pretraining framework that organizes heterogeneous medical corpora to broaden knowledge coverage and reduce long-tail gaps (e.g., rare diseases). For medical expert-level reasoning and interaction, MedXIAOHE incorporates diverse medical reasoning patterns via reinforcement learning and tool-augmented agentic training, enabling multi-step diagnostic reasoning with verifiable decision traces. To improve reliability in real-world use, MedXIAOHE integrates user-preference rubrics, evidence-grounded reasoning, and low-hallucination long-form report generation, with improved adherence to medical instructions. We release this report to document our practical design choices, scaling insights, and evaluation framework, hoping to inspire further research.
PDF565February 17, 2026