ACE-LoRA：面向医疗视觉语言模型参数高效自适应的图注意力上下文增强方法

摘要

CLIP类视觉语言模型在自然图像上的成功启发了医学领域的对应研究，但现有方法大多陷入两个极端：基于单领域数据训练的专科模型虽能捕捉领域细节但泛化能力差，而基于多领域数据训练的全科医学VLM虽保留广泛语义却弱化了细粒度诊断线索。为平衡这种专科化与泛化能力的矛盾，我们提出参数高效的通用医学VLM适配框架ACE-LoRA，在保持零样本泛化能力的同时，通过三个关键创新实现突破：首先，在冻结的图像-文本编码器中嵌入低秩自适应模块；其次，引入基于注意力的上下文增强超图神经网络模块，通过捕获超越成对相似性的高阶上下文交互，将局部诊断线索融入全局表征，解决了现有参数高效微调方法忽视细粒度细节的核心局限；此外，我们构建标签引导的InfoNCE损失函数，有效抑制语义相关图文对间的假阴性样本，增强跨模态对齐。尽管仅增加95万个可训练参数，ACE-LoRA在涵盖多领域的零样本分类、分割和检测任务中均显著优于当前最先进的医学VLM及参数高效微调基线。代码已开源：https://github.com/icon-lab/ACE-LoRA。

English

The success of CLIP-like vision-language models (VLMs) on natural images has inspired medical counterparts, yet existing approaches largely fall into two extremes: specialist models trained on single-domain data, which capture domain-specific details but generalize poorly, and generalist medical VLMs trained on multi-domain data, which retain broad semantics but dilute fine-grained diagnostic cues. Bridging this specialization-generalization trade-off remains challenging. To address this problem, we propose ACE-LoRA, a parameter-efficient adaptation framework for generalist medical VLMs that maintains robust zero-shot generalization. ACE-LoRA integrates Low-Rank Adaptation (LoRA) modules into frozen image-text encoders and introduces an Attention-based Context Enhancement Hypergraph Neural Network (ACE-HGNN) module that captures higher-order contextual interactions beyond pairwise similarity to enrich global representations with localized diagnostic cues, addressing a key limitation of prior Parameter-Efficient Fine-Tuning (PEFT) methods that overlook fine-grained details. To further enhance cross-modal alignment, we formulate a label-guided InfoNCE loss to effectively suppress false negatives between semantically related image-text pairs. Despite adding only 0.95M trainable parameters, ACE-LoRA consistently outperforms state-of-the-art medical VLMs and PEFT baselines across zero-shot classification, segmentation, and detection benchmarks spanning multiple domains. Our code is available at https://github.com/icon-lab/ACE-LoRA.