OmniConsistency:从成对风格化数据中学习风格无关的一致性
OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data
May 24, 2025
作者: Yiren Song, Cheng Liu, Mike Zheng Shou
cs.AI
摘要
扩散模型在图像风格化领域取得了显著进展,但仍面临两大核心挑战:(1) 在复杂场景中保持一致的风格化效果,特别是在身份、构图和细节方面;(2) 在使用风格LoRA的图像到图像处理流程中防止风格退化。GPT-4o在风格化一致性上的卓越表现凸显了开源方法与专有模型之间的性能差距。为弥合这一差距,我们提出了OmniConsistency,这是一个利用大规模扩散变换器(DiTs)的通用一致性插件。OmniConsistency的贡献包括:(1) 一种基于对齐图像对训练的上下文一致性学习框架,以实现稳健的泛化能力;(2) 一种两阶段渐进学习策略,将风格学习与一致性保持解耦,以减轻风格退化;(3) 一种完全即插即用的设计,兼容Flux框架下的任意风格LoRA。大量实验表明,OmniConsistency显著提升了视觉连贯性和美学质量,达到了与商业顶尖模型GPT-4o相媲美的性能。
English
Diffusion models have advanced image stylization significantly, yet two core
challenges persist: (1) maintaining consistent stylization in complex scenes,
particularly identity, composition, and fine details, and (2) preventing style
degradation in image-to-image pipelines with style LoRAs. GPT-4o's exceptional
stylization consistency highlights the performance gap between open-source
methods and proprietary models. To bridge this gap, we propose
OmniConsistency, a universal consistency plugin leveraging large-scale
Diffusion Transformers (DiTs). OmniConsistency contributes: (1) an in-context
consistency learning framework trained on aligned image pairs for robust
generalization; (2) a two-stage progressive learning strategy decoupling style
learning from consistency preservation to mitigate style degradation; and (3) a
fully plug-and-play design compatible with arbitrary style LoRAs under the Flux
framework. Extensive experiments show that OmniConsistency significantly
enhances visual coherence and aesthetic quality, achieving performance
comparable to commercial state-of-the-art model GPT-4o.Summary
AI-Generated Summary