CSGO:文本到图像生成中的内容-风格组合
CSGO: Content-Style Composition in Text-to-Image Generation
August 29, 2024
作者: Peng Xing, Haofan Wang, Yanpeng Sun, Qixun Wang, Xu Bai, Hao Ai, Renyuan Huang, Zechao Li
cs.AI
摘要
扩散模型在可控图像生成方面展现出卓越能力,进一步激发了图像风格迁移的研究热情。由于特定数据的稀缺性,现有研究主要集中于免训练方法(如图像反转)。本研究提出了一种内容-风格-风格化图像三元组的数据构建流程,能够自动生成并清洗风格化数据三元组。基于该流程,我们构建了首个大规模风格迁移数据集IMAGStyle,包含21万个图像三元组,可供学界探索研究。依托IMAGStyle数据集,我们提出端到端训练的CSGO风格迁移模型,通过独立特征注入显式解耦内容与风格特征。该统一模型实现了图像驱动的风格迁移、文本驱动的风格化合成以及文本编辑驱动的风格化合成。大量实验证明,我们的方法能有效增强图像生成中的风格控制能力。更多可视化结果及源代码获取请访问项目页面:https://csgo-gen.github.io/。
English
The diffusion model has shown exceptional capabilities in controlled image
generation, which has further fueled interest in image style transfer. Existing
works mainly focus on training free-based methods (e.g., image inversion) due
to the scarcity of specific data. In this study, we present a data construction
pipeline for content-style-stylized image triplets that generates and
automatically cleanses stylized data triplets. Based on this pipeline, we
construct a dataset IMAGStyle, the first large-scale style transfer dataset
containing 210k image triplets, available for the community to explore and
research. Equipped with IMAGStyle, we propose CSGO, a style transfer model
based on end-to-end training, which explicitly decouples content and style
features employing independent feature injection. The unified CSGO implements
image-driven style transfer, text-driven stylized synthesis, and text
editing-driven stylized synthesis. Extensive experiments demonstrate the
effectiveness of our approach in enhancing style control capabilities in image
generation. Additional visualization and access to the source code can be
located on the project page: https://csgo-gen.github.io/.