ChatPaper.aiChatPaper

OmniConsistency:從配對風格化數據中學習風格無關的一致性

OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data

May 24, 2025
作者: Yiren Song, Cheng Liu, Mike Zheng Shou
cs.AI

摘要

擴散模型在圖像風格化方面取得了顯著進展,但仍面臨兩個核心挑戰:(1) 在複雜場景中保持一致的風格化,特別是身份、構圖和細節方面;(2) 在基於風格LoRA的圖像到圖像處理流程中防止風格退化。GPT-4o在風格化一致性上的卓越表現凸顯了開源方法與專有模型之間的性能差距。為彌補這一差距,我們提出了OmniConsistency,這是一個利用大規模擴散變換器(DiTs)的通用一致性插件。OmniConsistency的貢獻包括:(1) 基於對齊圖像對訓練的上下文一致性學習框架,實現了強大的泛化能力;(2) 兩階段漸進學習策略,將風格學習與一致性保持解耦,以減輕風格退化;(3) 完全即插即用的設計,兼容Flux框架下的任意風格LoRA。大量實驗表明,OmniConsistency顯著提升了視覺連貫性和美學質量,達到了與商業頂尖模型GPT-4o相當的性能。
English
Diffusion models have advanced image stylization significantly, yet two core challenges persist: (1) maintaining consistent stylization in complex scenes, particularly identity, composition, and fine details, and (2) preventing style degradation in image-to-image pipelines with style LoRAs. GPT-4o's exceptional stylization consistency highlights the performance gap between open-source methods and proprietary models. To bridge this gap, we propose OmniConsistency, a universal consistency plugin leveraging large-scale Diffusion Transformers (DiTs). OmniConsistency contributes: (1) an in-context consistency learning framework trained on aligned image pairs for robust generalization; (2) a two-stage progressive learning strategy decoupling style learning from consistency preservation to mitigate style degradation; and (3) a fully plug-and-play design compatible with arbitrary style LoRAs under the Flux framework. Extensive experiments show that OmniConsistency significantly enhances visual coherence and aesthetic quality, achieving performance comparable to commercial state-of-the-art model GPT-4o.

Summary

AI-Generated Summary

PDF622May 28, 2025