视觉语言模型的奇异值少样本适应
Singular Value Few-shot Adaptation of Vision-Language Models
September 3, 2025
作者: Taha Koleilat, Hassan Rivaz, Yiming Xiao
cs.AI
摘要
诸如CLIP等视觉-语言模型(VLMs)已在多种应用中展现出卓越的零样本与小样本学习能力。然而,由于依赖提示工程及全模型微调的高昂成本,将这些模型适配至新的细粒度领域仍具挑战。现有适配方法多依赖于增强组件,如提示令牌与适配器模块,这可能会限制适配质量、导致模型不稳定,并损害预训练期间习得的丰富知识。本研究提出CLIP-SVD,一种新颖的多模态且参数高效的适配技术,它利用奇异值分解(SVD)在不引入额外模块的情况下调整CLIP内部参数空间。具体而言,我们仅微调CLIP参数矩阵的奇异值,以重新缩放基向量实现领域适配,同时保留预训练模型。这一设计使得仅使用模型总参数的0.04%即可提升适配性能,并更好地保持其泛化能力。CLIP-SVD在11个自然数据集和10个生物医学数据集上取得了最先进的分类结果,在少样本设置下的准确率与泛化能力均超越先前方法。此外,我们采用基于自然语言的方法分析CLIP适配的有效性与动态过程,从而增强CLIP-SVD的可解释性。代码已公开于https://github.com/HealthX-Lab/CLIP-SVD。
English
Vision-language models (VLMs) like CLIP have shown impressive zero-shot and
few-shot learning capabilities across diverse applications. However, adapting
these models to new fine-grained domains remains difficult due to reliance on
prompt engineering and the high cost of full model fine-tuning. Existing
adaptation approaches rely on augmented components, such as prompt tokens and
adapter modules, which could limit adaptation quality, destabilize the model,
and compromise the rich knowledge learned during pretraining. In this work, we
present CLIP-SVD, a novel multi-modal and
parameter-efficient adaptation technique that leverages Singular Value
Decomposition (SVD) to modify the internal parameter space of CLIP without
injecting additional modules. Specifically, we fine-tune only the singular
values of the CLIP parameter matrices to rescale the basis vectors for domain
adaptation while retaining the pretrained model. This design enables enhanced
adaptation performance using only 0.04\% of the model's total
parameters and better preservation of its generalization ability. CLIP-SVD
achieves state-of-the-art classification results on 11 natural and 10
biomedical datasets, outperforming previous methods in both accuracy and
generalization under few-shot settings. Additionally, we leverage a natural
language-based approach to analyze the effectiveness and dynamics of the CLIP
adaptation to allow interpretability of CLIP-SVD. The code is publicly
available at https://github.com/HealthX-Lab/CLIP-SVD.