ZePo：使用更快速的取樣進行零樣式肖像風格化

摘要

基於擴散的文本到圖像生成模型已顯著推動了藝術內容合成領域的發展。然而，目前的肖像風格化方法通常要求基於示例進行模型微調，或者採用 DDIM 逆向轉換將圖像還原為噪聲空間，這兩者都顯著減緩了圖像生成過程。為了克服這些限制，本文提出了一種基於擴散模型的無逆向轉換肖像風格化框架，僅需四個取樣步驟即可實現內容和風格特徵融合。我們觀察到，採用一致性提煉的潛在一致性模型可以有效從噪聲圖像中提取具代表性的一致性特徵。為了融合從內容和風格圖像中提取的一致性特徵，我們引入了一種風格增強注意力控制技術，精心將內容和風格特徵融合在目標圖像的注意力空間內。此外，我們提出了一種特徵融合策略，將一致性特徵中的冗餘特徵合併，從而降低注意力控制的計算負載。大量實驗驗證了我們提出的框架在提高風格化效率和保真度方面的有效性。代碼可在 https://github.com/liujin112/ZePo 找到。

English

Diffusion-based text-to-image generation models have significantly advanced the field of art content synthesis. However, current portrait stylization methods generally require either model fine-tuning based on examples or the employment of DDIM Inversion to revert images to noise space, both of which substantially decelerate the image generation process. To overcome these limitations, this paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps. We observed that Latent Consistency Models employing consistency distillation can effectively extract representative Consistency Features from noisy images. To blend the Consistency Features extracted from both content and style images, we introduce a Style Enhancement Attention Control technique that meticulously merges content and style features within the attention space of the target image. Moreover, we propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control. Extensive experiments have validated the effectiveness of our proposed framework in enhancing stylization efficiency and fidelity. The code is available at https://github.com/liujin112/ZePo.

ZePo：使用更快速的取樣進行零樣式肖像風格化

ZePo: Zero-Shot Portrait Stylization with Faster Sampling

摘要

Support