VPA:完全測試時間視覺提示適應

VPA: Fully Test-Time Visual Prompt Adaptation

September 26, 2023
作者: Jiachen Sun, Mark Ibrahim, Melissa Hall, Ivan Evtimov, Z. Morley Mao, Cristian Canton Ferrer, Caner Hazirbas
cs.AI

摘要

文本提示微調已經顯示出顯著的性能改進,將自然語言處理模型適應各種下游任務,將手工設計的提示視為可訓練的參數。受文本提示成功的啟發,幾項研究調查了視覺提示微調的有效性。在這項工作中,我們提出了視覺提示適應(VPA),這是第一個通用化視覺提示的框架,具有測試時適應性。VPA引入了少量可學習的標記,實現完全的測試時和存儲效率高的適應,而無需源域信息。我們在不同的適應設置下檢驗了我們的VPA設計,包括單圖像、批量圖像和虛標籤適應。我們在多個任務上評估了VPA,包括分布外泛化、污染魯棒性和領域適應。實驗結果顯示,VPA在各種模型上有效提高了3.3%的分布外泛化,超越了先前的測試時方法。此外,我們展示VPA相對於強基準線,將污染魯棒性提高了6.5%。最後,我們證明VPA還可以相對提高5.2%的領域適應性能。我們的VPA還在提高視覺語言模型的零樣本識別的穩健性方面表現出顯著的效果。
English
Textual prompt tuning has demonstrated significant performance improvements in adapting natural language processing models to a variety of downstream tasks by treating hand-engineered prompts as trainable parameters. Inspired by the success of textual prompting, several studies have investigated the efficacy of visual prompt tuning. In this work, we present Visual Prompt Adaptation (VPA), the first framework that generalizes visual prompting with test-time adaptation. VPA introduces a small number of learnable tokens, enabling fully test-time and storage-efficient adaptation without necessitating source-domain information. We examine our VPA design under diverse adaptation settings, encompassing single-image, batched-image, and pseudo-label adaptation. We evaluate VPA on multiple tasks, including out-of-distribution (OOD) generalization, corruption robustness, and domain adaptation. Experimental results reveal that VPA effectively enhances OOD generalization by 3.3% across various models, surpassing previous test-time approaches. Furthermore, we show that VPA improves corruption robustness by 6.5% compared to strong baselines. Finally, we demonstrate that VPA also boosts domain adaptation performance by relatively 5.2%. Our VPA also exhibits marked effectiveness in improving the robustness of zero-shot recognition for vision-language models.

Summary

AI-Generated Summary

PDF51December 15, 2024