텍스트-이미지 생성 모델을 활용한 비지도 구성적 개념 발견

초록

텍스트-이미지 생성 모델은 다양한 도메인에서 고해상도 이미지 합성을 가능하게 했지만, 사용자가 생성하고자 하는 콘텐츠를 명시해야 한다는 제약이 있습니다. 본 논문에서는 이와 반대의 문제를 고려합니다 — 다양한 이미지 컬렉션이 주어졌을 때, 각 이미지를 대표하는 생성적 개념을 발견할 수 있을까요? 우리는 이미지 컬렉션에서 생성적 개념을 발견하기 위한 비지도 학습 접근법을 제시하며, 이를 통해 그림에서의 다양한 예술 스타일, 주방 장면에서의 물체와 조명, 그리고 ImageNet 이미지가 주어졌을 때의 이미지 클래스를 분리해냅니다. 우리는 이러한 생성적 개념이 이미지의 콘텐츠를 정확하게 표현할 수 있고, 새로운 예술적 및 하이브리드 이미지를 생성하기 위해 재조합 및 구성될 수 있으며, 하류 분류 작업을 위한 표현으로 추가적으로 사용될 수 있음을 보여줍니다.

English

Text-to-image generative models have enabled high-resolution image synthesis across different domains, but require users to specify the content they wish to generate. In this paper, we consider the inverse problem -- given a collection of different images, can we discover the generative concepts that represent each image? We present an unsupervised approach to discover generative concepts from a collection of images, disentangling different art styles in paintings, objects, and lighting from kitchen scenes, and discovering image classes given ImageNet images. We show how such generative concepts can accurately represent the content of images, be recombined and composed to generate new artistic and hybrid images, and be further used as a representation for downstream classification tasks.

텍스트-이미지 생성 모델을 활용한 비지도 구성적 개념 발견

Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models

초록

Support