StyleDrop: 어떤 스타일로도 텍스트-이미지 생성하기

초록

사전 학습된 대형 텍스트-이미지 모델은 적절한 텍스트 프롬프트를 사용하여 인상적인 이미지를 합성합니다. 그러나 자연어에 내재된 모호성과 분포 외 효과로 인해 특정 디자인 패턴, 질감 또는 재질을 활용한 이미지 스타일을 합성하는 것은 어려운 작업입니다. 본 논문에서는 텍스트-이미지 모델을 사용하여 특정 스타일을 충실히 따르는 이미지를 합성할 수 있는 StyleDrop 방법을 소개합니다. 제안된 방법은 매우 다재다능하며, 사용자가 제공한 스타일의 색상 구성, 음영, 디자인 패턴, 그리고 지역적 및 전역적 효과와 같은 미묘한 차이와 세부 사항을 포착합니다. 이 방법은 매우 적은 수의 학습 가능한 매개변수(전체 모델 매개변수의 1% 미만)를 미세 조정하고, 인간 또는 자동화된 피드백을 통한 반복적인 학습을 통해 품질을 향상시킴으로써 새로운 스타일을 효율적으로 학습합니다. 더 나아가, StyleDrop은 사용자가 원하는 스타일을 지정하는 단일 이미지만 제공하더라도 인상적인 결과를 제공할 수 있습니다. 광범위한 연구 결과, 스타일 튜닝 텍스트-이미지 모델 작업에서 Muse에 구현된 StyleDrop은 Imagen 또는 Stable Diffusion에서의 DreamBooth 및 텍스트 반전을 포함한 다른 방법들을 압도적으로 능가함을 보여줍니다. 더 많은 결과는 프로젝트 웹사이트(https://styledrop.github.io)에서 확인할 수 있습니다.

English

Pre-trained large text-to-image models synthesize impressive images with an appropriate use of text prompts. However, ambiguities inherent in natural language and out-of-distribution effects make it hard to synthesize image styles, that leverage a specific design pattern, texture or material. In this paper, we introduce StyleDrop, a method that enables the synthesis of images that faithfully follow a specific style using a text-to-image model. The proposed method is extremely versatile and captures nuances and details of a user-provided style, such as color schemes, shading, design patterns, and local and global effects. It efficiently learns a new style by fine-tuning very few trainable parameters (less than 1% of total model parameters) and improving the quality via iterative training with either human or automated feedback. Better yet, StyleDrop is able to deliver impressive results even when the user supplies only a single image that specifies the desired style. An extensive study shows that, for the task of style tuning text-to-image models, StyleDrop implemented on Muse convincingly outperforms other methods, including DreamBooth and textual inversion on Imagen or Stable Diffusion. More results are available at our project website: https://styledrop.github.io

StyleDrop: 어떤 스타일로도 텍스트-이미지 생성하기

StyleDrop: Text-to-Image Generation in Any Style

초록

Support