PixelFlow: フローを用いたピクセル空間生成モデル

要旨

本研究では、PixelFlowと呼ばれる画像生成モデルのファミリーを提案します。このモデルは、主流の潜在空間モデルとは対照的に、生のピクセル空間で直接動作します。このアプローチにより、事前学習済みの変分オートエンコーダ（VAE）が不要となり、モデル全体をエンドツーエンドで学習可能にすることで、画像生成プロセスが簡素化されます。効率的なカスケードフローモデリングを通じて、PixelFlowはピクセル空間での計算コストを抑えつつ、256×256のImageNetクラス条件付き画像生成ベンチマークにおいて1.98のFIDを達成しました。テキストから画像への定性的な結果は、PixelFlowが画像品質、芸術性、および意味的制御において優れていることを示しています。この新しいパラダイムが、次世代の視覚生成モデルに新たなインスピレーションと機会をもたらすことを期待しています。コードとモデルはhttps://github.com/ShoufaChen/PixelFlowで公開されています。

English

We present PixelFlow, a family of image generation models that operate directly in the raw pixel space, in contrast to the predominant latent-space models. This approach simplifies the image generation process by eliminating the need for a pre-trained Variational Autoencoder (VAE) and enabling the whole model end-to-end trainable. Through efficient cascade flow modeling, PixelFlow achieves affordable computation cost in pixel space. It achieves an FID of 1.98 on 256times256 ImageNet class-conditional image generation benchmark. The qualitative text-to-image results demonstrate that PixelFlow excels in image quality, artistry, and semantic control. We hope this new paradigm will inspire and open up new opportunities for next-generation visual generation models. Code and models are available at https://github.com/ShoufaChen/PixelFlow.

PixelFlow: フローを用いたピクセル空間生成モデル

PixelFlow: Pixel-Space Generative Models with Flow

要旨

Support