一种风格对应一套代码:通过离散风格空间解锁代码到风格的图像生成
A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space
November 13, 2025
作者: Huijie Liu, Shuhao Cui, Haoxiang Cao, Shuai Ma, Kai Wu, Guoliang Kang
cs.AI
摘要
创新性视觉风格化是艺术创作的基石,然而生成新颖且一致的视觉风格仍面临重大挑战。现有生成方法通常依赖冗长的文本提示、参考图像或参数高效微调来指导风格感知的图像生成,但往往存在风格一致性不足、创造力受限和风格表示复杂等问题。本文通过提出"代码到风格"图像生成这一新任务,论证了"一种风格仅需一个数值编码"的理念——该方法仅基于数值风格编码即可生成具有新颖、一致视觉风格的图像。迄今为止,该领域主要由业界(如Midjourney)探索,学术界尚未有开源研究成果。为填补这一空白,我们提出了首个开源方法CoTyle。具体而言,我们首先从图像集合中训练离散风格码本以提取风格嵌入,这些嵌入作为文生图扩散模型的条件来生成风格化图像。随后,我们在离散风格嵌入上训练自回归风格生成器以建模其分布,从而实现新颖风格嵌入的合成。在推理阶段,数值风格码通过风格生成器映射为唯一风格嵌入,该嵌入引导文生图扩散模型生成对应风格的图像。与现有方法不同,我们的方法以极简输入解锁了海量可复现的风格空间,兼具无与伦比的简洁性与多样性。大量实验验证了CoTyle能有效将数值编码转化为风格控制器,充分证明"一种风格,一个编码"的价值。
English
Innovative visual stylization is a cornerstone of artistic creation, yet generating novel and consistent visual styles remains a significant challenge. Existing generative approaches typically rely on lengthy textual prompts, reference images, or parameter-efficient fine-tuning to guide style-aware image generation, but often struggle with style consistency, limited creativity, and complex style representations. In this paper, we affirm that a style is worth one numerical code by introducing the novel task, code-to-style image generation, which produces images with novel, consistent visual styles conditioned solely on a numerical style code. To date, this field has only been primarily explored by the industry (e.g., Midjourney), with no open-source research from the academic community. To fill this gap, we propose CoTyle, the first open-source method for this task. Specifically, we first train a discrete style codebook from a collection of images to extract style embeddings. These embeddings serve as conditions for a text-to-image diffusion model (T2I-DM) to generate stylistic images. Subsequently, we train an autoregressive style generator on the discrete style embeddings to model their distribution, allowing the synthesis of novel style embeddings. During inference, a numerical style code is mapped to a unique style embedding by the style generator, and this embedding guides the T2I-DM to generate images in the corresponding style. Unlike existing methods, our method offers unparalleled simplicity and diversity, unlocking a vast space of reproducible styles from minimal input. Extensive experiments validate that CoTyle effectively turns a numerical code into a style controller, demonstrating a style is worth one code.