ChatPaper.aiChatPaper

透過生成表示來進行自我條件圖像生成

Self-conditioned Image Generation via Generating Representations

December 6, 2023
作者: Tianhong Li, Dina Katabi, Kaiming He
cs.AI

摘要

本文提出了一種稱為「Representation-Conditioned image Generation (RCG)」的簡單而有效的圖像生成框架,它在無類別條件的圖像生成方面設立了新的基準。RCG不會根據任何人類標註進行條件設置。相反,它根據從圖像分佈中使用預先訓練的編碼器映射出的自監督表示分佈進行條件設置。在生成過程中,RCG從這種表示分佈中採樣,使用表示擴散模型 (RDM),並利用像素生成器來生成條件於所採樣表示的圖像像素。這種設計在生成過程中提供了重要的指導,從而實現高質量的圖像生成。在 ImageNet 256x256 上進行測試,RCG 實現了 Frechet Inception Distance (FID) 為 3.31 和 Inception Score (IS) 為 253.4。這些結果不僅顯著改進了無類別條件圖像生成的最新技術水平,還與當前領先的有類別條件圖像生成方法相媲美,彌合了這兩個任務之間長期存在的性能差距。代碼可在 https://github.com/LTH14/rcg 獲得。
English
This paper presents Representation-Conditioned image Generation (RCG), a simple yet effective image generation framework which sets a new benchmark in class-unconditional image generation. RCG does not condition on any human annotations. Instead, it conditions on a self-supervised representation distribution which is mapped from the image distribution using a pre-trained encoder. During generation, RCG samples from such representation distribution using a representation diffusion model (RDM), and employs a pixel generator to craft image pixels conditioned on the sampled representation. Such a design provides substantial guidance during the generative process, resulting in high-quality image generation. Tested on ImageNet 256times256, RCG achieves a Frechet Inception Distance (FID) of 3.31 and an Inception Score (IS) of 253.4. These results not only significantly improve the state-of-the-art of class-unconditional image generation but also rival the current leading methods in class-conditional image generation, bridging the long-standing performance gap between these two tasks. Code is available at https://github.com/LTH14/rcg.
PDF90December 15, 2024