PromptStyler：基于提示驱动的无源领域泛化风格生成

摘要

在一个联合的视觉-语言空间中，文本特征（例如，“一张狗的照片”）可以有效地表示其相关的图像特征（例如，来自狗的照片）。受此启发，我们提出了PromptStyler，通过合成各种风格来模拟联合空间中的各种分布转移，而无需使用任何图像来处理无源领域泛化。我们的方法学习生成各种风格特征（例如，“一个S*风格的a”），通过可学习的伪词S*的风格词向量。为了确保学习到的风格不会扭曲内容信息，我们强制要求风格-内容特征（例如，“一个S*风格的a[class]”）在联合视觉-语言空间中与其相应的内容特征（例如，“[class]”）附近。在学习风格词向量后，我们使用合成的风格-内容特征训练线性分类器。尽管不需要任何图像，并且仅使用单个GPU进行训练大约30分钟，PromptStyler在PACS、VLCS、OfficeHome和DomainNet上取得了最先进的成果。

English

In a joint vision-language space, a text feature (e.g., from "a photo of a dog") could effectively represent its relevant image features (e.g., from dog photos). Inspired by this, we propose PromptStyler which simulates various distribution shifts in the joint space by synthesizing diverse styles via prompts without using any images to deal with source-free domain generalization. Our method learns to generate a variety of style features (from "a S* style of a") via learnable style word vectors for pseudo-words S*. To ensure that learned styles do not distort content information, we force style-content features (from "a S* style of a [class]") to be located nearby their corresponding content features (from "[class]") in the joint vision-language space. After learning style word vectors, we train a linear classifier using synthesized style-content features. PromptStyler achieves the state of the art on PACS, VLCS, OfficeHome and DomainNet, although it does not require any images and takes just ~30 minutes for training using a single GPU.

PromptStyler：基于提示驱动的无源领域泛化风格生成

PromptStyler: Prompt-driven Style Generation for Source-free Domain Generalization

摘要

Support