Text2Layer: 潜在拡散モデルを用いた階層的画像生成

要旨

レイヤー合成は、アマチュアからプロフェッショナルまで幅広く利用されている最も一般的な画像編集ワークフローの一つです。拡散モデルの成功に触発され、私たちはレイヤー合成をレイヤー画像生成の観点から探求します。単一の画像を生成する代わりに、背景、前景、レイヤーマスク、および合成画像を同時に生成することを提案します。レイヤー画像生成を実現するために、レイヤー画像を再構築可能なオートエンコーダを訓練し、その潜在表現上で拡散モデルを学習させます。提案手法の利点の一つは、高品質な画像出力に加えて、より優れた合成ワークフローを可能にすることです。もう一つの利点は、画像セグメンテーションの別ステップで生成されるマスクと比較して、より高品質なレイヤーマスクを生成できることです。実験結果は、提案手法が高品質なレイヤー画像を生成可能であり、今後の研究のためのベンチマークを確立することを示しています。

English

Layer compositing is one of the most popular image editing workflows among both amateurs and professionals. Motivated by the success of diffusion models, we explore layer compositing from a layered image generation perspective. Instead of generating an image, we propose to generate background, foreground, layer mask, and the composed image simultaneously. To achieve layered image generation, we train an autoencoder that is able to reconstruct layered images and train diffusion models on the latent representation. One benefit of the proposed problem is to enable better compositing workflows in addition to the high-quality image output. Another benefit is producing higher-quality layer masks compared to masks produced by a separate step of image segmentation. Experimental results show that the proposed method is able to generate high-quality layered images and initiates a benchmark for future work.

Text2Layer: 潜在拡散モデルを用いた階層的画像生成

Text2Layer: Layered Image Generation using Latent Diffusion Model

要旨

Support