基于球体编码器的图像生成

摘要

我们提出球面编码器——一种高效生成框架，该框架仅需单次前向传播即可生成图像，并在少于五步的生成步骤中与多步扩散模型相媲美。该方法通过训练编码器将自然图像均匀映射至球面潜空间，同时训练解码器将随机潜向量映射回图像空间。模型仅通过图像重建损失进行训练，通过直接解码球面上的随机点即可生成图像。该架构天然支持条件生成，且对编码器/解码器进行数次循环迭代可进一步提升图像质量。在多个数据集上的实验表明，球面编码器方法的性能可与最先进的扩散模型相竞争，而推理成本仅需其极小部分。项目页面详见https://sphere-encoder.github.io。

English

We introduce the Sphere Encoder, an efficient generative framework capable of producing images in a single forward pass and competing with many-step diffusion models using fewer than five steps. Our approach works by learning an encoder that maps natural images uniformly onto a spherical latent space, and a decoder that maps random latent vectors back to the image space. Trained solely through image reconstruction losses, the model generates an image by simply decoding a random point on the sphere. Our architecture naturally supports conditional generation, and looping the encoder/decoder a few times can further enhance image quality. Across several datasets, the sphere encoder approach yields performance competitive with state of the art diffusions, but with a small fraction of the inference cost. Project page is available at https://sphere-encoder.github.io .