OralGPT-Omni:多功能牙科多模态大型语言模型
OralGPT-Omni: A Versatile Dental Multimodal Large Language Model
November 27, 2025
作者: Jing Hao, Yuci Liang, Lizhuo Lin, Yuxuan Fan, Wenkai Zhou, Kaixin Guo, Zanting Ye, Yanpeng Sun, Xinyu Zhang, Yanqi Yang, Qiankun Li, Hao Tang, James Kit-Hon Tsoi, Linlin Shen, Kuo Feng Hung
cs.AI
摘要
多模态大语言模型(MLLMs)已在众多医疗专科领域展现出巨大潜力,然而牙科领域的探索仍显不足,部分原因在于领域特定数据有限、牙科专家标注稀缺、模态专用建模不充分以及可靠性方面的挑战。本文提出OralGPT-Omni——首个面向牙科专业的MLLM,能够对多样化牙科影像模态和临床任务进行全面可靠的分析。为显式捕捉牙医的诊断逻辑,我们构建了TRACE-CoT数据集,该临床导向的思维链数据集复现了牙科放射医师的决策过程。这种推理监督机制与我们提出的四阶段训练范式相结合,显著增强了模型对牙科影像的理解与分析能力。与此同时,我们推出了MMOral-Uni——首个面向牙科影像分析的统一多模态基准测试集,包含涵盖5种影像模态和5类临床任务的2,809组开放式问答对,为数字牙科领域的MLLMs提供了迄今最全面的评估体系。OralGPT-Omni在MMOral-Uni基准测试中取得51.84的综合得分,在MMOral-OPG基准测试中获得45.31分,显著超越GPT-5的表现。本研究推动了智能牙科发展,为牙科影像分析的未来突破铺平道路。所有代码、基准测试集和模型将公开共享。
English
Multimodal Large Language Models (MLLMs) have exhibited immense potential across numerous medical specialties; yet, dentistry remains underexplored, in part due to limited domain-specific data, scarce dental expert annotations, insufficient modality-specific modeling, and challenges in reliability. In this paper, we present OralGPT-Omni, the first dental-specialized MLLM designed for comprehensive and trustworthy analysis across diverse dental imaging modalities and clinical tasks. To explicitly capture dentists' diagnostic reasoning, we construct TRACE-CoT, a clinically grounded chain-of-thought dataset that mirrors dental radiologists' decision-making processes. This reasoning supervision, combined with our proposed four-stage training paradigm, substantially strengthens the model's capacity for dental image understanding and analysis. In parallel, we introduce MMOral-Uni, the first unified multimodal benchmark for dental image analysis. It comprises 2,809 open-ended question-answer pairs spanning five modalities and five tasks, offering a comprehensive evaluation suite to date for MLLMs in digital dentistry. OralGPT-Omni achieves an overall score of 51.84 on the MMOral-Uni benchmark and 45.31 on the MMOral-OPG benchmark, dramatically outperforming the scores of GPT-5. Our work promotes intelligent dentistry and paves the way for future advances in dental image analysis. All code, benchmark, and models will be made publicly available.