OmniConsistency: 스타일화된 데이터 쌍에서 스타일-불변 일관성 학습

초록

디퓨전 모델은 이미지 스타일화를 크게 발전시켰지만, 여전히 두 가지 핵심 과제가 남아 있습니다: (1) 복잡한 장면에서, 특히 정체성, 구성 및 세부 사항에서 일관된 스타일화를 유지하는 것, 그리고 (2) 스타일 LoRA를 사용한 이미지-이미지 파이프라인에서 스타일 저하를 방지하는 것. GPT-4o의 탁월한 스타일화 일관성은 오픈소스 방법과 독점 모델 간의 성능 격차를 잘 보여줍니다. 이 격차를 해소하기 위해, 우리는 대규모 디퓨전 트랜스포머(DiT)를 활용한 범용 일관성 플러그인인 OmniConsistency를 제안합니다. OmniConsistency는 다음과 같은 기여를 합니다: (1) 정렬된 이미지 쌍에 대해 훈련된 컨텍스트 내 일관성 학습 프레임워크를 통해 강력한 일반화를 달성하고, (2) 스타일 학습과 일관성 보존을 분리하는 두 단계의 점진적 학습 전략을 통해 스타일 저하를 완화하며, (3) Flux 프레임워크 하에서 임의의 스타일 LoRA와 호환되는 완전한 플러그앤플레이 설계를 제공합니다. 광범위한 실험을 통해 OmniConsistency가 시각적 일관성과 미적 품질을 크게 향상시키며, 상용 최첨단 모델인 GPT-4o에 필적하는 성능을 달성함을 보여줍니다.

English

Diffusion models have advanced image stylization significantly, yet two core challenges persist: (1) maintaining consistent stylization in complex scenes, particularly identity, composition, and fine details, and (2) preventing style degradation in image-to-image pipelines with style LoRAs. GPT-4o's exceptional stylization consistency highlights the performance gap between open-source methods and proprietary models. To bridge this gap, we propose OmniConsistency, a universal consistency plugin leveraging large-scale Diffusion Transformers (DiTs). OmniConsistency contributes: (1) an in-context consistency learning framework trained on aligned image pairs for robust generalization; (2) a two-stage progressive learning strategy decoupling style learning from consistency preservation to mitigate style degradation; and (3) a fully plug-and-play design compatible with arbitrary style LoRAs under the Flux framework. Extensive experiments show that OmniConsistency significantly enhances visual coherence and aesthetic quality, achieving performance comparable to commercial state-of-the-art model GPT-4o.

OmniConsistency: 스타일화된 데이터 쌍에서 스타일-불변 일관성 학습

OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data

초록

Support