Text2Layer：使用潛在擴散模型生成分層圖像

摘要

圖層合成是業餘愛好者和專業人士中最流行的圖像編輯工作流之一。受擴散模型成功的啟發，我們從分層圖像生成的角度探索圖層合成。我們提出不僅生成圖像，還同時生成背景、前景、圖層遮罩和合成圖像。為了實現分層圖像生成，我們訓練了一個能夠重建分層圖像的自編碼器，並在潛在表示上訓練擴散模型。提出問題的一個好處是除了高質量的圖像輸出外，還能實現更好的合成工作流程。另一個好處是相對於通過圖像分割的單獨步驟生成的遮罩，能夠生成更高質量的圖層遮罩。實驗結果表明，提出的方法能夠生成高質量的分層圖像，並為未來工作建立了基準。

English

Layer compositing is one of the most popular image editing workflows among both amateurs and professionals. Motivated by the success of diffusion models, we explore layer compositing from a layered image generation perspective. Instead of generating an image, we propose to generate background, foreground, layer mask, and the composed image simultaneously. To achieve layered image generation, we train an autoencoder that is able to reconstruct layered images and train diffusion models on the latent representation. One benefit of the proposed problem is to enable better compositing workflows in addition to the high-quality image output. Another benefit is producing higher-quality layer masks compared to masks produced by a separate step of image segmentation. Experimental results show that the proposed method is able to generate high-quality layered images and initiates a benchmark for future work.

Text2Layer：使用潛在擴散模型生成分層圖像

Text2Layer: Layered Image Generation using Latent Diffusion Model

摘要

Support