HealthGPT:一個醫療大型視覺語言模型,通過異構知識適應統一理解與生成
HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation
February 14, 2025
作者: Tianwei Lin, Wenqiao Zhang, Sijing Li, Yuqian Yuan, Binhe Yu, Haoyuan Li, Wanggui He, Hao Jiang, Mengze Li, Xiaohui Song, Siliang Tang, Jun Xiao, Hui Lin, Yueting Zhuang, Beng Chin Ooi
cs.AI
摘要
我們推出HealthGPT,這是一個強大的醫學大型視覺語言模型(Med-LVLM),它將醫學視覺理解與生成能力整合於一個統一的自我回歸框架中。我們的引導理念是逐步將異質的理解與生成知識適應於預訓練的大型語言模型(LLMs)。這通過一種新穎的異質低秩適應(H-LoRA)技術實現,該技術輔以定制的分層視覺感知方法和三階段學習策略。為了有效學習HealthGPT,我們設計了一個全面的醫學領域專用理解與生成數據集,名為VL-Health。實驗結果顯示,HealthGPT在醫學視覺統一任務中表現出卓越的性能和可擴展性。我們的項目可訪問於https://github.com/DCDmllm/HealthGPT。
English
We present HealthGPT, a powerful Medical Large Vision-Language Model
(Med-LVLM) that integrates medical visual comprehension and generation
capabilities within a unified autoregressive paradigm. Our bootstrapping
philosophy is to progressively adapt heterogeneous comprehension and generation
knowledge to pre-trained large language models (LLMs). This is achieved through
a novel heterogeneous low-rank adaptation (H-LoRA) technique, which is
complemented by a tailored hierarchical visual perception approach and a
three-stage learning strategy. To effectively learn the HealthGPT, we devise a
comprehensive medical domain-specific comprehension and generation dataset
called VL-Health. Experimental results demonstrate exceptional performance and
scalability of HealthGPT in medical visual unified tasks. Our project can be
accessed at https://github.com/DCDmllm/HealthGPT.Summary
AI-Generated Summary