일러스트가 포함된 지침서 생성

초록

우리는 사용자의 요구에 맞춰진 시각적 지침서, 즉 '일러스트레이티드 인스트럭션(Illustrated Instructions)'을 생성하는 새로운 과제를 소개합니다. 이 과제에 고유한 요구 사항들을 식별하고, 생성물의 타당성, 일관성, 효용성을 측정하기 위한 자동 및 인간 평가 지표 세트를 통해 이를 공식화했습니다. 우리는 대규모 언어 모델(LLM)의 강점과 강력한 텍스트-이미지 생성 확산 모델을 결합하여, 텍스트 입력을 기반으로 이러한 일러스트레이티드 인스트럭션을 생성하는 '스택드디퓨전(StackedDiffusion)'이라는 간단한 접근 방식을 제안합니다. 결과적으로 이 모델은 기준 접근 방식과 최신 멀티모달 LLM을 크게 능가하며, 30%의 경우 사용자들은 이를 인간이 작성한 문서보다 더 선호하기도 합니다. 특히, 이 모델은 웹상의 정적 문서가 제공할 수 있는 범위를 훨씬 넘어서는 다양한 새롭고 흥미로운 응용 프로그램을 가능하게 합니다. 예를 들어, 사용자의 개별 상황에 맞춰 중간 단계와 그림이 포함된 맞춤형 지침서를 생성하는 등의 기능이 있습니다.

English

We introduce the new task of generating Illustrated Instructions, i.e., visual instructions customized to a user's needs. We identify desiderata unique to this task, and formalize it through a suite of automatic and human evaluation metrics, designed to measure the validity, consistency, and efficacy of the generations. We combine the power of large language models (LLMs) together with strong text-to-image generation diffusion models to propose a simple approach called StackedDiffusion, which generates such illustrated instructions given text as input. The resulting model strongly outperforms baseline approaches and state-of-the-art multimodal LLMs; and in 30% of cases, users even prefer it to human-generated articles. Most notably, it enables various new and exciting applications far beyond what static articles on the web can provide, such as personalized instructions complete with intermediate steps and pictures in response to a user's individual situation.

일러스트가 포함된 지침서 생성

Generating Illustrated Instructions

초록

Support