ChatPaper.aiChatPaper

語義學:一個適應性影像條件擴散模型

Semantica: An Adaptable Image-Conditioned Diffusion Model

May 23, 2024
作者: Manoj Kumar, Neil Houlsby, Emiel Hoogeboom
cs.AI

摘要

我們研究了將影像生成模型適應不同數據集的任務,而無需進行微調。為此,我們引入了Semantica,一種基於影像條件的擴散模型,能夠根據條件影像的語義生成影像。Semantica僅在網絡規模的影像對上進行訓練,即接收來自網頁的隨機影像作為條件輸入,並對同一網頁的另一個隨機影像進行建模。我們的實驗突出了預訓練影像編碼器的表達能力,以及實現高質量影像生成所需的基於語義的數據篩選的必要性。一旦訓練完成,它可以通過僅使用該數據集中的影像作為輸入,自適應地生成新影像。我們研究了Semantica在ImageNet、LSUN教堂、LSUN臥室和SUN397上的轉移特性。
English
We investigate the task of adapting image generative models to different datasets without finetuneing. To this end, we introduce Semantica, an image-conditioned diffusion model capable of generating images based on the semantics of a conditioning image. Semantica is trained exclusively on web-scale image pairs, that is it receives a random image from a webpage as conditional input and models another random image from the same webpage. Our experiments highlight the expressivity of pretrained image encoders and necessity of semantic-based data filtering in achieving high-quality image generation. Once trained, it can adaptively generate new images from a dataset by simply using images from that dataset as input. We study the transfer properties of Semantica on ImageNet, LSUN Churches, LSUN Bedroom and SUN397.

Summary

AI-Generated Summary

PDF110December 15, 2024