CSGO:文本到圖像生成中的內容風格合成
CSGO: Content-Style Composition in Text-to-Image Generation
August 29, 2024
作者: Peng Xing, Haofan Wang, Yanpeng Sun, Qixun Wang, Xu Bai, Hao Ai, Renyuan Huang, Zechao Li
cs.AI
摘要
擴散模型在受控圖像生成方面展現出卓越的能力,進一步激發了對圖像風格轉移的興趣。現有研究主要集中在訓練基於自由的方法(例如圖像反演),這是由於特定數據稀缺所致。在本研究中,我們提出了一個用於生成並自動清理風格化數據三元組的數據構建流程。基於這個流程,我們建立了一個名為IMAGStyle的數據集,這是第一個包含210k圖像三元組的大規模風格轉移數據集,可供社區進行探索和研究。憑藉IMAGStyle,我們提出了一個基於端到端訓練的風格轉移模型CSGO,該模型明確地解耦了內容和風格特徵,採用獨立的特徵注入。統一的CSGO實現了基於圖像驅動的風格轉移、基於文本的風格化合成以及基於文本編輯的風格化合成。大量實驗證明了我們方法在增強圖像生成中風格控制能力方面的有效性。有關更多可視化和源代碼訪問,請查看項目頁面:https://csgo-gen.github.io/。
English
The diffusion model has shown exceptional capabilities in controlled image
generation, which has further fueled interest in image style transfer. Existing
works mainly focus on training free-based methods (e.g., image inversion) due
to the scarcity of specific data. In this study, we present a data construction
pipeline for content-style-stylized image triplets that generates and
automatically cleanses stylized data triplets. Based on this pipeline, we
construct a dataset IMAGStyle, the first large-scale style transfer dataset
containing 210k image triplets, available for the community to explore and
research. Equipped with IMAGStyle, we propose CSGO, a style transfer model
based on end-to-end training, which explicitly decouples content and style
features employing independent feature injection. The unified CSGO implements
image-driven style transfer, text-driven stylized synthesis, and text
editing-driven stylized synthesis. Extensive experiments demonstrate the
effectiveness of our approach in enhancing style control capabilities in image
generation. Additional visualization and access to the source code can be
located on the project page: https://csgo-gen.github.io/.