공유 어텐션을 통한 스타일 정렬 이미지 생성

초록

대규모 텍스트-이미지(T2I) 모델은 텍스트 프롬프트에서 시각적으로 매력적인 출력물을 생성하며 창의적 분야에서 빠르게 주목받고 있다. 그러나 이러한 모델을 제어하여 일관된 스타일을 보장하는 것은 여전히 어려운 과제로, 기존 방법들은 콘텐츠와 스타일을 분리하기 위해 미세 조정과 수동 개입을 필요로 한다. 본 논문에서는 생성된 이미지 시리즈 간의 스타일 정렬을 확립하기 위한 새로운 기법인 StyleAligned를 소개한다. 확산 과정 중 최소한의 '주의 공유(attention sharing)'를 활용함으로써, 우리의 방법은 T2I 모델 내에서 이미지 간의 스타일 일관성을 유지한다. 이 접근법은 간단한 역변환 작업을 통해 참조 스타일을 사용하여 스타일 일관성을 가진 이미지를 생성할 수 있게 한다. 다양한 스타일과 텍스트 프롬프트에 걸친 우리 방법의 평가는 고품질 합성과 충실도를 보여주며, 다양한 입력에 걸쳐 일관된 스타일을 달성하는 데 있어 그 효과성을 강조한다.

English

Large-scale Text-to-Image (T2I) models have rapidly gained prominence across creative fields, generating visually compelling outputs from textual prompts. However, controlling these models to ensure consistent style remains challenging, with existing methods necessitating fine-tuning and manual intervention to disentangle content and style. In this paper, we introduce StyleAligned, a novel technique designed to establish style alignment among a series of generated images. By employing minimal `attention sharing' during the diffusion process, our method maintains style consistency across images within T2I models. This approach allows for the creation of style-consistent images using a reference style through a straightforward inversion operation. Our method's evaluation across diverse styles and text prompts demonstrates high-quality synthesis and fidelity, underscoring its efficacy in achieving consistent style across various inputs.

공유 어텐션을 통한 스타일 정렬 이미지 생성

Style Aligned Image Generation via Shared Attention

초록

Support