구형 인코더를 활용한 이미지 생성

초록

우리는 단일 순전파만으로 이미지를 생성할 수 있으며, 5단계 미만의 적은 스텝으로 다단계 확산 모델들과 경쟁력 있는 성능을 보이는 효율적인 생성 프레임워크인 Sphere Encoder를 소개합니다. 우리의 접근 방식은 자연 이미지를 구형 잠재 공간 위에 균일하게 매핑하는 인코더와, 무작위 잠재 벡터를 이미지 공간으로 다시 매핑하는 디코더를 학습하는 방식으로 작동합니다. 순수하게 이미지 재구성 손실만으로 학습된 이 모델은 구 위의 무작위 지점을 디코딩하는 것만으로 이미지를 생성합니다. 우리의 아키텍처는 조건부 생성을 자연스럽게 지원하며, 인코더와 디코더를 몇 차례 반복하면 이미지 품질을 더욱 향상시킬 수 있습니다. 여러 데이터셋에 걸쳐 Sphere Encoder 접근법은 최첨단 확산 모델들과 경쟁력 있는 성능을 보이지만, 추론 비용은 극소수에 불과합니다. 프로젝트 페이지는 https://sphere-encoder.github.io에서 확인할 수 있습니다.

English

We introduce the Sphere Encoder, an efficient generative framework capable of producing images in a single forward pass and competing with many-step diffusion models using fewer than five steps. Our approach works by learning an encoder that maps natural images uniformly onto a spherical latent space, and a decoder that maps random latent vectors back to the image space. Trained solely through image reconstruction losses, the model generates an image by simply decoding a random point on the sphere. Our architecture naturally supports conditional generation, and looping the encoder/decoder a few times can further enhance image quality. Across several datasets, the sphere encoder approach yields performance competitive with state of the art diffusions, but with a small fraction of the inference cost. Project page is available at https://sphere-encoder.github.io .

구형 인코더를 활용한 이미지 생성

Image Generation with a Sphere Encoder

초록

Support