Text2Layer：使用潜在扩散模型生成分层图像

摘要

图层合成是业余爱好者和专业人士中最流行的图像编辑工作流之一。受扩散模型成功的启发，我们从分层图像生成的角度探讨图层合成。我们提出了同时生成背景、前景、图层蒙版和合成图像的方法，而不是仅生成一幅图像。为了实现分层图像生成，我们训练了一个能够重建分层图像的自动编码器，并在潜在表示上训练了扩散模型。提出的问题的一个好处是除了高质量的图像输出外，还能实现更好的合成工作流程。另一个好处是相比于通过图像分割的单独步骤生成的蒙版，能够生成更高质量的图层蒙版。实验结果表明，提出的方法能够生成高质量的分层图像，并为未来工作建立了基准。

English

Layer compositing is one of the most popular image editing workflows among both amateurs and professionals. Motivated by the success of diffusion models, we explore layer compositing from a layered image generation perspective. Instead of generating an image, we propose to generate background, foreground, layer mask, and the composed image simultaneously. To achieve layered image generation, we train an autoencoder that is able to reconstruct layered images and train diffusion models on the latent representation. One benefit of the proposed problem is to enable better compositing workflows in addition to the high-quality image output. Another benefit is producing higher-quality layer masks compared to masks produced by a separate step of image segmentation. Experimental results show that the proposed method is able to generate high-quality layered images and initiates a benchmark for future work.

Text2Layer：使用潜在扩散模型生成分层图像

Text2Layer: Layered Image Generation using Latent Diffusion Model

摘要

Support