PromptStyler: ソースフリー領域一般化のためのプロンプト駆動型スタイル生成

要旨

視覚と言語の結合空間において、テキスト特徴（例えば「犬の写真」から得られるもの）は、関連する画像特徴（例えば犬の写真から得られるもの）を効果的に表現することができる。これに着想を得て、我々はPromptStylerを提案する。これは、ソースフリーのドメイン一般化に対処するため、画像を使用せずにプロンプトを通じて多様なスタイルを合成することで、結合空間における様々な分布シフトをシミュレートするものである。本手法では、学習可能なスタイル単語ベクトルを用いて、疑似単語S*に対する多様なスタイル特徴（「S*スタイルの」から得られるもの）を生成する方法を学習する。学習されたスタイルがコンテンツ情報を歪めないようにするため、スタイル-コンテンツ特徴（「S*スタイルの[クラス]」から得られるもの）が、結合視覚-言語空間内で対応するコンテンツ特徴（「[クラス]」から得られるもの）の近くに位置するように強制する。スタイル単語ベクトルを学習した後、合成されたスタイル-コンテンツ特徴を用いて線形分類器を訓練する。PromptStylerは、PACS、VLCS、OfficeHome、およびDomainNetにおいて、画像を一切必要とせず、単一のGPUを使用してわずか約30分の訓練時間で、最先端の性能を達成する。

English

In a joint vision-language space, a text feature (e.g., from "a photo of a dog") could effectively represent its relevant image features (e.g., from dog photos). Inspired by this, we propose PromptStyler which simulates various distribution shifts in the joint space by synthesizing diverse styles via prompts without using any images to deal with source-free domain generalization. Our method learns to generate a variety of style features (from "a S* style of a") via learnable style word vectors for pseudo-words S*. To ensure that learned styles do not distort content information, we force style-content features (from "a S* style of a [class]") to be located nearby their corresponding content features (from "[class]") in the joint vision-language space. After learning style word vectors, we train a linear classifier using synthesized style-content features. PromptStyler achieves the state of the art on PACS, VLCS, OfficeHome and DomainNet, although it does not require any images and takes just ~30 minutes for training using a single GPU.

PromptStyler: ソースフリー領域一般化のためのプロンプト駆動型スタイル生成

PromptStyler: Prompt-driven Style Generation for Source-free Domain Generalization

要旨

Support