StyleAdapter: 스타일화된 이미지 생성을 위한 단일 패스 LoRA-프리 모델

초록

본 논문은 텍스트 프롬프트와 스타일 참조 이미지를 입력으로 받아 단일 패스로 출력 이미지를 생성하는 LoRA-free 스타일 이미지 생성 방법을 제안한다. 기존 방법들이 각 스타일마다 별도의 LoRA를 학습해야 하는 것과 달리, 본 방법은 통합 모델로 다양한 스타일에 적응할 수 있다. 그러나 이는 두 가지 문제를 야기한다: 1) 프롬프트가 생성 콘텐츠에 대한 제어력을 상실하고, 2) 출력 이미지가 스타일 참조 이미지의 의미적 및 스타일적 특징을 모두 상속하여 콘텐츠 충실도가 저하된다. 이러한 문제를 해결하기 위해, 본 논문은 두 가지 구성 요소로 이루어진 StyleAdapter를 소개한다: 이중 경로 교차 주의 모듈(TPCA)과 세 가지 디커플링 전략. 이러한 구성 요소는 모델이 프롬프트와 스타일 참조 특징을 별도로 처리하고, 스타일 참조에서 의미 정보와 스타일 정보 간의 강한 결합을 줄이도록 한다. StyleAdapter는 단일 패스로 프롬프트의 콘텐츠와 참조 이미지의 스타일을 일치시키는 고품질 이미지를 생성할 수 있으며(심지어 보지 못한 스타일에서도), 이는 기존 방법보다 더 유연하고 효율적이다. 실험을 통해 본 방법이 기존 연구들보다 우수함을 입증하였다.

English

This paper presents a LoRA-free method for stylized image generation that takes a text prompt and style reference images as inputs and produces an output image in a single pass. Unlike existing methods that rely on training a separate LoRA for each style, our method can adapt to various styles with a unified model. However, this poses two challenges: 1) the prompt loses controllability over the generated content, and 2) the output image inherits both the semantic and style features of the style reference image, compromising its content fidelity. To address these challenges, we introduce StyleAdapter, a model that comprises two components: a two-path cross-attention module (TPCA) and three decoupling strategies. These components enable our model to process the prompt and style reference features separately and reduce the strong coupling between the semantic and style information in the style references. StyleAdapter can generate high-quality images that match the content of the prompts and adopt the style of the references (even for unseen styles) in a single pass, which is more flexible and efficient than previous methods. Experiments have been conducted to demonstrate the superiority of our method over previous works.

StyleAdapter: 스타일화된 이미지 생성을 위한 단일 패스 LoRA-프리 모델

StyleAdapter: A Single-Pass LoRA-Free Model for Stylized Image Generation

초록

Support