Text2Layer: 잠재 확산 모델을 활용한 계층적 이미지 생성

초록

레이어 합성은 아마추어와 전문가 모두에게 가장 인기 있는 이미지 편집 워크플로 중 하나입니다. 확산 모델의 성공에 영감을 받아, 우리는 레이어 이미지 생성 관점에서 레이어 합성을 탐구합니다. 단일 이미지를 생성하는 대신, 배경, 전경, 레이어 마스크, 그리고 합성된 이미지를 동시에 생성하는 방법을 제안합니다. 레이어 이미지 생성을 달성하기 위해, 우리는 레이어 이미지를 재구성할 수 있는 오토인코더를 학습시키고 잠재 표현에 대해 확산 모델을 학습시킵니다. 제안된 문제의 한 가지 장점은 고품질 이미지 출력 외에도 더 나은 합성 워크플로를 가능하게 한다는 점입니다. 또 다른 장점은 별도의 이미지 분할 단계에서 생성된 마스크보다 더 높은 품질의 레이어 마스크를 생성한다는 것입니다. 실험 결과는 제안된 방법이 고품질의 레이어 이미지를 생성할 수 있으며, 향후 연구를 위한 벤치마크를 제시함을 보여줍니다.

English

Layer compositing is one of the most popular image editing workflows among both amateurs and professionals. Motivated by the success of diffusion models, we explore layer compositing from a layered image generation perspective. Instead of generating an image, we propose to generate background, foreground, layer mask, and the composed image simultaneously. To achieve layered image generation, we train an autoencoder that is able to reconstruct layered images and train diffusion models on the latent representation. One benefit of the proposed problem is to enable better compositing workflows in addition to the high-quality image output. Another benefit is producing higher-quality layer masks compared to masks produced by a separate step of image segmentation. Experimental results show that the proposed method is able to generate high-quality layered images and initiates a benchmark for future work.

Text2Layer: 잠재 확산 모델을 활용한 계층적 이미지 생성

Text2Layer: Layered Image Generation using Latent Diffusion Model

초록

Support