자동회귀 모델을 활용한 개인 맞춤형 텍스트-이미지 생성

초록

개인화된 이미지 합성은 특정 주체를 다양한 맥락에서 표현한 이미지를 생성할 수 있는 텍스트-이미지 생성의 핵심 응용 분야로 부상했습니다. 확산 모델이 이 분야를 주도하고 있지만, 텍스트와 이미지 모델링을 위한 통합 아키텍처를 갖춘 자기회귀 모델은 개인화된 이미지 생성에 있어 아직 충분히 탐구되지 않았습니다. 본 논문은 자기회귀 모델의 잠재력을 최적화하여 개인화된 이미지 합성을 수행할 수 있는 가능성을 탐구하며, 이를 위해 모델의 내재된 다중모달 능력을 활용합니다. 우리는 텍스트 임베딩 최적화와 트랜스포머 레이어 미세 조정을 결합한 두 단계의 학습 전략을 제안합니다. 자기회귀 모델에 대한 실험 결과, 이 방법은 최신 확산 기반 개인화 방법과 비교할 만한 주체 충실도와 프롬프트 준수도를 달성함을 보여줍니다. 이러한 결과는 개인화된 이미지 생성에서 자기회귀 모델의 효과를 입증하며, 이 분야의 미래 연구를 위한 새로운 방향을 제시합니다.

English

Personalized image synthesis has emerged as a pivotal application in text-to-image generation, enabling the creation of images featuring specific subjects in diverse contexts. While diffusion models have dominated this domain, auto-regressive models, with their unified architecture for text and image modeling, remain underexplored for personalized image generation. This paper investigates the potential of optimizing auto-regressive models for personalized image synthesis, leveraging their inherent multimodal capabilities to perform this task. We propose a two-stage training strategy that combines optimization of text embeddings and fine-tuning of transformer layers. Our experiments on the auto-regressive model demonstrate that this method achieves comparable subject fidelity and prompt following to the leading diffusion-based personalization methods. The results highlight the effectiveness of auto-regressive models in personalized image generation, offering a new direction for future research in this area.

자동회귀 모델을 활용한 개인 맞춤형 텍스트-이미지 생성

Personalized Text-to-Image Generation with Auto-Regressive Models

초록

Support