NeuroPrompts: 텍스트-이미지 생성을 위한 프롬프트 최적화를 위한 적응형 프레임워크

초록

텍스트-이미지 확산 모델의 최근 놀라운 발전에도 불구하고, 고품질 이미지를 얻기 위해서는 해당 모델 사용에 전문성을 갖춘 인간의 프롬프트 엔지니어링이 필요한 경우가 많습니다. 본 연구에서는 NeuroPrompts를 제안합니다. 이는 사용자의 프롬프트를 자동으로 개선하여 텍스트-이미지 모델이 생성하는 결과물의 품질을 높이는 적응형 프레임워크입니다. 우리의 프레임워크는 사전 훈련된 언어 모델을 활용한 제약 텍스트 디코딩을 사용하며, 이 모델은 인간 프롬프트 엔지니어가 생성한 것과 유사한 프롬프트를 생성하도록 조정되었습니다. 이 접근 방식은 더 높은 품질의 텍스트-이미지 생성을 가능하게 하고, 제약 조건 집합을 통해 사용자가 스타일리시한 특징을 제어할 수 있도록 합니다. 우리는 Stable Diffusion을 사용하여 프롬프트 개선 및 이미지 생성을 위한 인터랙티브 애플리케이션을 개발함으로써 이 프레임워크의 유용성을 입증합니다. 또한, 텍스트-이미지 생성을 위해 인간이 엔지니어링한 대규모 데이터셋을 활용한 실험을 수행하고, 우리의 접근 방식이 자동으로 개선된 프롬프트를 생성하여 더 우수한 이미지 품질을 이끌어냄을 보여줍니다. 우리는 NeuroPrompts의 코드, 스크린캐스트 데모 비디오 및 라이브 데모 인스턴스를 공개적으로 제공합니다.

English

Despite impressive recent advances in text-to-image diffusion models, obtaining high-quality images often requires prompt engineering by humans who have developed expertise in using them. In this work, we present NeuroPrompts, an adaptive framework that automatically enhances a user's prompt to improve the quality of generations produced by text-to-image models. Our framework utilizes constrained text decoding with a pre-trained language model that has been adapted to generate prompts similar to those produced by human prompt engineers. This approach enables higher-quality text-to-image generations and provides user control over stylistic features via constraint set specification. We demonstrate the utility of our framework by creating an interactive application for prompt enhancement and image generation using Stable Diffusion. Additionally, we conduct experiments utilizing a large dataset of human-engineered prompts for text-to-image generation and show that our approach automatically produces enhanced prompts that result in superior image quality. We make our code, a screencast video demo and a live demo instance of NeuroPrompts publicly available.

NeuroPrompts: 텍스트-이미지 생성을 위한 프롬프트 최적화를 위한 적응형 프레임워크

NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation

초록

Support