자기 조건화된 표현 생성을 통한 이미지 생성

초록

본 논문은 클래스 무조건 이미지 생성 분야에서 새로운 벤치마크를 제시하는 간단하지만 효과적인 이미지 생성 프레임워크인 Representation-Conditioned Image Generation (RCG)을 소개합니다. RCG는 어떠한 인간 주석에도 의존하지 않습니다. 대신, 사전 학습된 인코더를 사용하여 이미지 분포에서 매핑된 자기 지도 표현 분포에 조건을 둡니다. 생성 과정에서 RCG는 표현 확산 모델(RDM)을 사용하여 이러한 표현 분포에서 샘플링하고, 샘플링된 표현에 조건을 둔 픽셀 생성기를 사용하여 이미지 픽셀을 생성합니다. 이러한 설계는 생성 과정에서 상당한 지침을 제공하여 고품질의 이미지 생성을 가능하게 합니다. ImageNet 256×256에서 테스트한 결과, RCG는 Frechet Inception Distance (FID) 3.31과 Inception Score (IS) 253.4를 달성했습니다. 이러한 결과는 클래스 무조건 이미지 생성의 최신 기술을 크게 개선할 뿐만 아니라, 클래스 조건 이미지 생성의 선두 방법들과도 경쟁할 만한 성능을 보여주며, 이 두 작업 간의 오랜 성능 격차를 해소했습니다. 코드는 https://github.com/LTH14/rcg에서 확인할 수 있습니다.

English

This paper presents Representation-Conditioned image Generation (RCG), a simple yet effective image generation framework which sets a new benchmark in class-unconditional image generation. RCG does not condition on any human annotations. Instead, it conditions on a self-supervised representation distribution which is mapped from the image distribution using a pre-trained encoder. During generation, RCG samples from such representation distribution using a representation diffusion model (RDM), and employs a pixel generator to craft image pixels conditioned on the sampled representation. Such a design provides substantial guidance during the generative process, resulting in high-quality image generation. Tested on ImageNet 256times256, RCG achieves a Frechet Inception Distance (FID) of 3.31 and an Inception Score (IS) of 253.4. These results not only significantly improve the state-of-the-art of class-unconditional image generation but also rival the current leading methods in class-conditional image generation, bridging the long-standing performance gap between these two tasks. Code is available at https://github.com/LTH14/rcg.

자기 조건화된 표현 생성을 통한 이미지 생성

Self-conditioned Image Generation via Generating Representations

초록

Support