牙科GPT:激励牙科领域的多模态复杂推理
DentalGPT: Incentivizing Multimodal Complex Reasoning in Dentistry
December 12, 2025
作者: Zhenyang Cai, Jiaming Zhang, Junjie Zhao, Ziyi Zeng, Yanchao Li, Jingyi Liang, Junying Chen, Yunjin Yang, Jiajun You, Shuzhi Deng, Tongfei Wang, Wanting Chen, Chunxiu Hao, Ruiqi Xie, Zhenwei Wen, Xiangyi Feng, Zou Ting, Jin Zou Lin, Jianquan Li, Guangjun Yu, Liangyi Chen, Junwen Wang, Shan Jiang, Benyou Wang
cs.AI
摘要
在口腔医疗自动化领域,多模态数据的可靠解读至关重要,然而当前的多模态大语言模型(MLLMs)难以捕捉细粒度的口腔视觉细节,且缺乏精准诊断所需的充分推理能力。为突破这些局限,我们提出DentalGPT——通过高质量领域知识注入与强化学习开发的专科口腔MLLM。具体而言,我们整合了逾12万张标注口腔图像及其突出诊断相关视觉特征的详细描述,构建了迄今规模最大的口腔多模态标注数据集,这也是当前涵盖口腔图像最全面的多模态数据集。基于该数据集的训练显著增强了MLLM对口腔病征的视觉理解能力,而后续的强化学习阶段进一步强化了其多模态复杂推理能力。在口内影像与全景片基准测试及医学VQA基准的口腔子集上的综合评估表明,DentalGPT在疾病分类和口腔VQA任务中均实现卓越性能,仅凭70亿参数即超越众多先进MLLMs。这些结果证明,高质量口腔数据与分阶段适配策略相结合,为构建高效能的专科口腔MLLMs提供了有效路径。
English
Reliable interpretation of multimodal data in dentistry is essential for automated oral healthcare, yet current multimodal large language models (MLLMs) struggle to capture fine-grained dental visual details and lack sufficient reasoning ability for precise diagnosis. To address these limitations, we present DentalGPT, a specialized dental MLLM developed through high-quality domain knowledge injection and reinforcement learning. Specifically, the largest annotated multimodal dataset for dentistry to date was constructed by aggregating over 120k dental images paired with detailed descriptions that highlight diagnostically relevant visual features, making it the multimodal dataset with the most extensive collection of dental images to date. Training on this dataset significantly enhances the MLLM's visual understanding of dental conditions, while the subsequent reinforcement learning stage further strengthens its capability for multimodal complex reasoning. Comprehensive evaluations on intraoral and panoramic benchmarks, along with dental subsets of medical VQA benchmarks, show that DentalGPT achieves superior performance in disease classification and dental VQA tasks, outperforming many state-of-the-art MLLMs despite having only 7B parameters. These results demonstrate that high-quality dental data combined with staged adaptation provides an effective pathway for building capable and domain-specialized dental MLLMs.