ACE-LoRA：面向医疗视觉语言模型参数高效自适应的图注意力上下文增强方法

摘要

CLIP类视觉语言模型在自然图像上的成功启发了医学领域的对应研究，但现有方法大多陷入两个极端：基于单领域数据训练的专科模型虽能捕捉领域特异性细节但泛化能力差，而基于多领域数据训练的全科医学VLM虽保留广泛语义却稀释了细粒度诊断线索。如何平衡这种专科化与泛化性的矛盾仍是挑战。为此，我们提出ACE-LoRA——一种参数高效的通用医学VLM适配框架，在保持强大零样本泛化能力的同时，通过参数微调实现性能提升。该框架将低秩自适应模块集成至冻结的图像-文本编码器，并引入基于注意力的上下文增强超图神经网络模块，通过捕获超越成对相似性的高阶上下文交互，将局部化诊断线索融入全局表征，解决了现有参数高效微调方法忽视细粒度细节的关键局限。为进一步增强跨模态对齐，我们设计了标签引导的InfoNCE损失函数，有效抑制语义相关图文对之间的假阴性样本。尽管仅增加95万个可训练参数，ACE-LoRA在涵盖多领域的零样本分类、分割和检测任务中均持续超越最先进的医学VLM及参数高效微调基线方法。代码已开源于https://github.com/icon-lab/ACE-LoRA。

English

The success of CLIP-like vision-language models (VLMs) on natural images has inspired medical counterparts, yet existing approaches largely fall into two extremes: specialist models trained on single-domain data, which capture domain-specific details but generalize poorly, and generalist medical VLMs trained on multi-domain data, which retain broad semantics but dilute fine-grained diagnostic cues. Bridging this specialization-generalization trade-off remains challenging. To address this problem, we propose ACE-LoRA, a parameter-efficient adaptation framework for generalist medical VLMs that maintains robust zero-shot generalization. ACE-LoRA integrates Low-Rank Adaptation (LoRA) modules into frozen image-text encoders and introduces an Attention-based Context Enhancement Hypergraph Neural Network (ACE-HGNN) module that captures higher-order contextual interactions beyond pairwise similarity to enrich global representations with localized diagnostic cues, addressing a key limitation of prior Parameter-Efficient Fine-Tuning (PEFT) methods that overlook fine-grained details. To further enhance cross-modal alignment, we formulate a label-guided InfoNCE loss to effectively suppress false negatives between semantically related image-text pairs. Despite adding only 0.95M trainable parameters, ACE-LoRA consistently outperforms state-of-the-art medical VLMs and PEFT baselines across zero-shot classification, segmentation, and detection benchmarks spanning multiple domains. Our code is available at https://github.com/icon-lab/ACE-LoRA.

ACE-LoRA：面向医疗视觉语言模型参数高效自适应的图注意力上下文增强方法

ACE-LoRA: Graph-Attentive Context Enhancement for Parameter-Efficient Adaptation of Medical Vision-Language Models

摘要

Support