風格即一碼:透過離散風格空間解鎖程式碼到風格的影像生成
A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space
November 13, 2025
作者: Huijie Liu, Shuhao Cui, Haoxiang Cao, Shuai Ma, Kai Wu, Guoliang Kang
cs.AI
摘要
創新視覺風格化是藝術創作的基石,然而生成新穎且一致的視覺風格仍是重大挑戰。現有生成方法通常依賴冗長文本提示、參考圖像或參數高效微調來引導風格感知圖像生成,但常面臨風格一致性不足、創造力受限及複雜風格表徵等問題。本文提出「數值代碼即風格」的核心理念,首創代碼驅動風格圖像生成的新任務——僅憑數值風格代碼即可生成具新穎性與一致性的視覺風格圖像。迄今該領域主要由業界(如Midjourney)探索,學術界尚未有開源研究。為填補此空白,我們提出首個開源方法CoTyle。具體而言,我們先從圖像集合訓練離散風格碼本以提取風格嵌入,這些嵌入將作為文生圖擴散模型的條件來生成風格化圖像。隨後,我們在離散風格嵌入上訓練自回歸風格生成器以建模其分佈,從而實現新風格嵌入的合成。推理階段,數值風格代碼通過風格生成器映射為獨特風格嵌入,該嵌入引導文生圖擴散模型生成對應風格的圖像。相較現有方法,本方法以極簡輸入解鎖海量可復現風格空間,兼具無與倫比的簡潔性與多樣性。大量實驗驗證CoTyle能有效將數值代碼轉化為風格控制器,實證「一碼一風格」的可行性。
English
Innovative visual stylization is a cornerstone of artistic creation, yet generating novel and consistent visual styles remains a significant challenge. Existing generative approaches typically rely on lengthy textual prompts, reference images, or parameter-efficient fine-tuning to guide style-aware image generation, but often struggle with style consistency, limited creativity, and complex style representations. In this paper, we affirm that a style is worth one numerical code by introducing the novel task, code-to-style image generation, which produces images with novel, consistent visual styles conditioned solely on a numerical style code. To date, this field has only been primarily explored by the industry (e.g., Midjourney), with no open-source research from the academic community. To fill this gap, we propose CoTyle, the first open-source method for this task. Specifically, we first train a discrete style codebook from a collection of images to extract style embeddings. These embeddings serve as conditions for a text-to-image diffusion model (T2I-DM) to generate stylistic images. Subsequently, we train an autoregressive style generator on the discrete style embeddings to model their distribution, allowing the synthesis of novel style embeddings. During inference, a numerical style code is mapped to a unique style embedding by the style generator, and this embedding guides the T2I-DM to generate images in the corresponding style. Unlike existing methods, our method offers unparalleled simplicity and diversity, unlocking a vast space of reproducible styles from minimal input. Extensive experiments validate that CoTyle effectively turns a numerical code into a style controller, demonstrating a style is worth one code.