ラベルなしでの視覚言語モデルの適応：包括的調査

要旨

ビジョン・ランゲージモデル（VLMs）は、幅広いタスクにおいて優れた汎化能力を示しています。しかし、特定の下流シナリオに直接適用する場合、タスク固有の適応なしでは性能が十分でないことが多いです。データ効率を保ちつつその有用性を高めるため、最近の研究ではラベル付きデータに依存しない教師なし適応手法に焦点が当てられています。この分野への関心が高まる中、教師なしVLM適応に特化した統一的なタスク指向のサーベイが不足しています。このギャップを埋めるため、本論文ではこの分野の包括的かつ構造化された概要を提示します。ラベルなし視覚データの可用性と性質に基づいた分類体系を提案し、既存のアプローチを4つの主要なパラダイムに分類します：データフリー転送（データなし）、教師なしドメイン転送（豊富なデータ）、エピソード的テストタイム適応（バッチデータ）、オンラインテストタイム適応（ストリーミングデータ）。この枠組みの中で、各パラダイムに関連する核心的な方法論と適応戦略を分析し、この分野の体系的な理解を確立することを目指します。さらに、多様なアプリケーションにおける代表的なベンチマークをレビューし、未解決の課題と将来の研究に向けた有望な方向性を強調します。関連文献のアクティブに維持されているリポジトリはhttps://github.com/tim-learn/Awesome-LabelFree-VLMsで利用可能です。

English

Vision-Language Models (VLMs) have demonstrated remarkable generalization capabilities across a wide range of tasks. However, their performance often remains suboptimal when directly applied to specific downstream scenarios without task-specific adaptation. To enhance their utility while preserving data efficiency, recent research has increasingly focused on unsupervised adaptation methods that do not rely on labeled data. Despite the growing interest in this area, there remains a lack of a unified, task-oriented survey dedicated to unsupervised VLM adaptation. To bridge this gap, we present a comprehensive and structured overview of the field. We propose a taxonomy based on the availability and nature of unlabeled visual data, categorizing existing approaches into four key paradigms: Data-Free Transfer (no data), Unsupervised Domain Transfer (abundant data), Episodic Test-Time Adaptation (batch data), and Online Test-Time Adaptation (streaming data). Within this framework, we analyze core methodologies and adaptation strategies associated with each paradigm, aiming to establish a systematic understanding of the field. Additionally, we review representative benchmarks across diverse applications and highlight open challenges and promising directions for future research. An actively maintained repository of relevant literature is available at https://github.com/tim-learn/Awesome-LabelFree-VLMs.

ラベルなしでの視覚言語モデルの適応：包括的調査

Adapting Vision-Language Models Without Labels: A Comprehensive Survey

要旨

Support