HealthGPT：一個醫療大型視覺語言模型，通過異構知識適應統一理解與生成

摘要

我們推出HealthGPT，這是一個強大的醫學大型視覺語言模型（Med-LVLM），它將醫學視覺理解與生成能力整合於一個統一的自我回歸框架中。我們的引導理念是逐步將異質的理解與生成知識適應於預訓練的大型語言模型（LLMs）。這通過一種新穎的異質低秩適應（H-LoRA）技術實現，該技術輔以定制的分層視覺感知方法和三階段學習策略。為了有效學習HealthGPT，我們設計了一個全面的醫學領域專用理解與生成數據集，名為VL-Health。實驗結果顯示，HealthGPT在醫學視覺統一任務中表現出卓越的性能和可擴展性。我們的項目可訪問於https://github.com/DCDmllm/HealthGPT。

English

We present HealthGPT, a powerful Medical Large Vision-Language Model (Med-LVLM) that integrates medical visual comprehension and generation capabilities within a unified autoregressive paradigm. Our bootstrapping philosophy is to progressively adapt heterogeneous comprehension and generation knowledge to pre-trained large language models (LLMs). This is achieved through a novel heterogeneous low-rank adaptation (H-LoRA) technique, which is complemented by a tailored hierarchical visual perception approach and a three-stage learning strategy. To effectively learn the HealthGPT, we devise a comprehensive medical domain-specific comprehension and generation dataset called VL-Health. Experimental results demonstrate exceptional performance and scalability of HealthGPT in medical visual unified tasks. Our project can be accessed at https://github.com/DCDmllm/HealthGPT.

HealthGPT：一個醫療大型視覺語言模型，通過異構知識適應統一理解與生成

HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation

摘要

Support