PromptStyler: 소스 없는 도메인 일반화를 위한 프롬프트 기반 스타일 생성

초록

공동 시각-언어 공간에서, 텍스트 특징(예: "강아지 사진"에서 추출)은 관련 이미지 특징(예: 강아지 사진에서 추출)을 효과적으로 표현할 수 있습니다. 이를 영감으로 삼아, 우리는 소스 없는 도메인 일반화를 다루기 위해 이미지를 사용하지 않고 프롬프트를 통해 다양한 스타일을 합성함으로써 공동 공간에서의 다양한 분포 변화를 시뮬레이션하는 PromptStyler를 제안합니다. 우리의 방법은 학습 가능한 스타일 단어 벡터를 통해 가상 단어 S*에 대한 다양한 스타일 특징("a S* style of a"에서 추출)을 생성하는 방법을 학습합니다. 학습된 스타일이 콘텐츠 정보를 왜곡하지 않도록 하기 위해, 우리는 스타일-콘텐츠 특징("a S* style of a [클래스]"에서 추출)이 공동 시각-언어 공간에서 해당 콘텐츠 특징("[클래스]"에서 추출) 근처에 위치하도록 강제합니다. 스타일 단어 벡터를 학습한 후, 우리는 합성된 스타일-콘텐츠 특징을 사용하여 선형 분류기를 학습시킵니다. PromptStyler는 PACS, VLCS, OfficeHome 및 DomainNet에서 최첨단 성능을 달성하며, 단일 GPU를 사용하여 학습에 약 30분밖에 걸리지 않고 이미지를 전혀 필요로 하지 않습니다.

English

In a joint vision-language space, a text feature (e.g., from "a photo of a dog") could effectively represent its relevant image features (e.g., from dog photos). Inspired by this, we propose PromptStyler which simulates various distribution shifts in the joint space by synthesizing diverse styles via prompts without using any images to deal with source-free domain generalization. Our method learns to generate a variety of style features (from "a S* style of a") via learnable style word vectors for pseudo-words S*. To ensure that learned styles do not distort content information, we force style-content features (from "a S* style of a [class]") to be located nearby their corresponding content features (from "[class]") in the joint vision-language space. After learning style word vectors, we train a linear classifier using synthesized style-content features. PromptStyler achieves the state of the art on PACS, VLCS, OfficeHome and DomainNet, although it does not require any images and takes just ~30 minutes for training using a single GPU.

PromptStyler: 소스 없는 도메인 일반화를 위한 프롬프트 기반 스타일 생성

PromptStyler: Prompt-driven Style Generation for Source-free Domain Generalization

초록

Support