ACE-LoRA:面向医疗视觉语言模型参数高效自适应的图注意力上下文增强方法
ACE-LoRA: Graph-Attentive Context Enhancement for Parameter-Efficient Adaptation of Medical Vision-Language Models
March 17, 2026
作者: M. Arda Aydın, Melih B. Yilmaz, Aykut Koç, Tolga Çukur
cs.AI
摘要
CLIP类视觉语言模型在自然图像上的成功启发了医学领域的对应研究,但现有方法大多陷入两个极端:基于单领域数据训练的专科模型虽能捕捉领域细节但泛化能力差,而基于多领域数据训练的全科医学VLM虽保留广泛语义却弱化了细粒度诊断线索。为平衡这种专科化与泛化能力的矛盾,我们提出参数高效的通用医学VLM适配框架ACE-LoRA,在保持零样本泛化能力的同时,通过三个关键创新实现突破:首先,在冻结的图像-文本编码器中嵌入低秩自适应模块;其次,引入基于注意力的上下文增强超图神经网络模块,通过捕获超越成对相似性的高阶上下文交互,将局部诊断线索融入全局表征,解决了现有参数高效微调方法忽视细粒度细节的核心局限;此外,我们构建标签引导的InfoNCE损失函数,有效抑制语义相关图文对间的假阴性样本,增强跨模态对齐。尽管仅增加95万个可训练参数,ACE-LoRA在涵盖多领域的零样本分类、分割和检测任务中均显著优于当前最先进的医学VLM及参数高效微调基线。代码已开源:https://github.com/icon-lab/ACE-LoRA。
English
The success of CLIP-like vision-language models (VLMs) on natural images has inspired medical counterparts, yet existing approaches largely fall into two extremes: specialist models trained on single-domain data, which capture domain-specific details but generalize poorly, and generalist medical VLMs trained on multi-domain data, which retain broad semantics but dilute fine-grained diagnostic cues. Bridging this specialization-generalization trade-off remains challenging. To address this problem, we propose ACE-LoRA, a parameter-efficient adaptation framework for generalist medical VLMs that maintains robust zero-shot generalization. ACE-LoRA integrates Low-Rank Adaptation (LoRA) modules into frozen image-text encoders and introduces an Attention-based Context Enhancement Hypergraph Neural Network (ACE-HGNN) module that captures higher-order contextual interactions beyond pairwise similarity to enrich global representations with localized diagnostic cues, addressing a key limitation of prior Parameter-Efficient Fine-Tuning (PEFT) methods that overlook fine-grained details. To further enhance cross-modal alignment, we formulate a label-guided InfoNCE loss to effectively suppress false negatives between semantically related image-text pairs. Despite adding only 0.95M trainable parameters, ACE-LoRA consistently outperforms state-of-the-art medical VLMs and PEFT baselines across zero-shot classification, segmentation, and detection benchmarks spanning multiple domains. Our code is available at https://github.com/icon-lab/ACE-LoRA.