ChatPaper.aiChatPaper

InstantStyle-Plus:在文本到圖像生成中實現風格轉移並保留內容

InstantStyle-Plus: Style Transfer with Content-Preserving in Text-to-Image Generation

June 30, 2024
作者: Haofan Wang, Peng Xing, Renyuan Huang, Hao Ai, Qixun Wang, Xu Bai
cs.AI

摘要

風格轉移是一個創新的過程,旨在創建一幅保留原始精髓並同時擁抱另一種視覺風格的圖像。儘管擴散模型在個性化主題驅動或風格驅動應用中展示了令人印象深刻的生成能力,但現有的最先進方法仍然在實現內容保留和風格增強之間取得無縫平衡方面遇到困難。例如,增強風格的影響力往往會削弱內容的結構完整性。為了應對這些挑戰,我們將風格轉移任務分解為三個核心元素:1)風格,專注於圖像的美學特徵;2)空間結構,涉及視覺元素的幾何排列和構圖;和3)語義內容,捕捉圖像的概念含義。在這些原則的指導下,我們介紹了InstantStyle-Plus,一種強調保持原始內容完整性並無縫整合目標風格的方法。具體來說,我們的方法通過高效輕量的過程實現風格注入,利用尖端的InstantStyle框架。為了加強內容保留,我們通過反轉內容潛在噪聲和多功能即插即用的Tile ControlNet來啟動過程,以保留原始圖像的固有佈局。我們還結合了全局語義適配器來增強語義內容的忠實度。為了防止風格信息的稀釋,我們採用風格提取器作為鑑別器,提供補充的風格指導。代碼將可在https://github.com/instantX-research/InstantStyle-Plus 上獲得。
English
Style transfer is an inventive process designed to create an image that maintains the essence of the original while embracing the visual style of another. Although diffusion models have demonstrated impressive generative power in personalized subject-driven or style-driven applications, existing state-of-the-art methods still encounter difficulties in achieving a seamless balance between content preservation and style enhancement. For example, amplifying the style's influence can often undermine the structural integrity of the content. To address these challenges, we deconstruct the style transfer task into three core elements: 1) Style, focusing on the image's aesthetic characteristics; 2) Spatial Structure, concerning the geometric arrangement and composition of visual elements; and 3) Semantic Content, which captures the conceptual meaning of the image. Guided by these principles, we introduce InstantStyle-Plus, an approach that prioritizes the integrity of the original content while seamlessly integrating the target style. Specifically, our method accomplishes style injection through an efficient, lightweight process, utilizing the cutting-edge InstantStyle framework. To reinforce the content preservation, we initiate the process with an inverted content latent noise and a versatile plug-and-play tile ControlNet for preserving the original image's intrinsic layout. We also incorporate a global semantic adapter to enhance the semantic content's fidelity. To safeguard against the dilution of style information, a style extractor is employed as discriminator for providing supplementary style guidance. Codes will be available at https://github.com/instantX-research/InstantStyle-Plus.

Summary

AI-Generated Summary

PDF245November 28, 2024