球面エンコーダを用いた画像生成

要旨

本論文では、Sphere Encoderを提案する。これは単一のフォワードパスで画像を生成可能な効率的な生成フレームワークであり、5ステップ未満の少ないステップ数で多段階の拡散モデルと競合する性能を発揮する。本手法は、自然画像を球面潜在空間上に一様に写像するエンコーダと、ランダムな潜在ベクトルを画像空間に戻すデコーダを学習することで機能する。画像再構成損失のみで学習された本モデルは、球面上のランダムな点をデコードするだけで画像を生成する。本アーキテクチャは条件付き生成を自然にサポートし、エンコーダ/デコーダを数回ループさせることで画像品質をさらに向上させることができる。複数のデータセットにおいて、Sphere Encoderアプローチは最新の拡散モデルと競合する性能を達成しつつ、推論コストはわずかな分数に抑えられている。プロジェクトページはhttps://sphere-encoder.github.ioで公開されている。

English

We introduce the Sphere Encoder, an efficient generative framework capable of producing images in a single forward pass and competing with many-step diffusion models using fewer than five steps. Our approach works by learning an encoder that maps natural images uniformly onto a spherical latent space, and a decoder that maps random latent vectors back to the image space. Trained solely through image reconstruction losses, the model generates an image by simply decoding a random point on the sphere. Our architecture naturally supports conditional generation, and looping the encoder/decoder a few times can further enhance image quality. Across several datasets, the sphere encoder approach yields performance competitive with state of the art diffusions, but with a small fraction of the inference cost. Project page is available at https://sphere-encoder.github.io .

球面エンコーダを用いた画像生成

Image Generation with a Sphere Encoder

要旨

Support