VPA:完全测试时间视觉提示适应
VPA: Fully Test-Time Visual Prompt Adaptation
September 26, 2023
作者: Jiachen Sun, Mark Ibrahim, Melissa Hall, Ivan Evtimov, Z. Morley Mao, Cristian Canton Ferrer, Caner Hazirbas
cs.AI
摘要
文本提示调整已经在适应自然语言处理模型到各种下游任务方面展示出显著的性能改进,通过将手工设计的提示视为可训练参数。受文本提示成功的启发,一些研究已经调查了视觉提示调整的有效性。在这项工作中,我们提出了视觉提示适应(VPA),这是第一个将视觉提示与测试时适应相结合的框架。VPA引入了少量可学习的标记,实现了完全的测试时和存储高效的适应,而无需源领域信息。我们在不同的适应设置下检验了我们的VPA设计,包括单图像、批处理图像和伪标签适应。我们在多个任务上评估了VPA,包括超出分布(OOD)泛化、污染鲁棒性和领域适应。实验结果显示,VPA能够有效地提高各种模型的OOD泛化能力,超过了先前的测试时方法,提高了3.3%。此外,我们展示了VPA相对于强基线提高了6.5%的污染鲁棒性。最后,我们证明了VPA也能够相对提高5.2%的领域适应性能。我们的VPA还在提高视觉-语言模型的零样本识别的鲁棒性方面表现出显著的效果。
English
Textual prompt tuning has demonstrated significant performance improvements
in adapting natural language processing models to a variety of downstream tasks
by treating hand-engineered prompts as trainable parameters. Inspired by the
success of textual prompting, several studies have investigated the efficacy of
visual prompt tuning. In this work, we present Visual Prompt Adaptation (VPA),
the first framework that generalizes visual prompting with test-time
adaptation. VPA introduces a small number of learnable tokens, enabling fully
test-time and storage-efficient adaptation without necessitating source-domain
information. We examine our VPA design under diverse adaptation settings,
encompassing single-image, batched-image, and pseudo-label adaptation. We
evaluate VPA on multiple tasks, including out-of-distribution (OOD)
generalization, corruption robustness, and domain adaptation. Experimental
results reveal that VPA effectively enhances OOD generalization by 3.3% across
various models, surpassing previous test-time approaches. Furthermore, we show
that VPA improves corruption robustness by 6.5% compared to strong baselines.
Finally, we demonstrate that VPA also boosts domain adaptation performance by
relatively 5.2%. Our VPA also exhibits marked effectiveness in improving the
robustness of zero-shot recognition for vision-language models.Summary
AI-Generated Summary