利用文本到圖像生成模型進行無監督的組合概念發現

摘要

文本到圖像生成模型已經實現了跨不同領域的高分辨率圖像合成，但需要用戶指定他們希望生成的內容。在本文中，我們考慮了相反的問題 -- 給定一組不同的圖像，我們能否發現代表每個圖像的生成概念？我們提出了一種無監督方法，從一組圖像中發現生成概念，將繪畫中的不同藝術風格、物體和照明從廚房場景中解開，並從 ImageNet 圖像中發現圖像類別。我們展示了這些生成概念如何能夠準確地代表圖像的內容，可以重新組合和組合以生成新的藝術和混合圖像，並進一步用作下游分類任務的表示。

English

Text-to-image generative models have enabled high-resolution image synthesis across different domains, but require users to specify the content they wish to generate. In this paper, we consider the inverse problem -- given a collection of different images, can we discover the generative concepts that represent each image? We present an unsupervised approach to discover generative concepts from a collection of images, disentangling different art styles in paintings, objects, and lighting from kitchen scenes, and discovering image classes given ImageNet images. We show how such generative concepts can accurately represent the content of images, be recombined and composed to generate new artistic and hybrid images, and be further used as a representation for downstream classification tasks.

利用文本到圖像生成模型進行無監督的組合概念發現

Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models

摘要

Support