生成插圖化指引

摘要

我們介紹了一個新任務，即生成插圖說明，即針對用戶需求定制的視覺說明。我們確定了這個任務獨有的期望條件，並通過一系列自動和人工評估指標對其進行了形式化，旨在衡量生成物的有效性、一致性和功效。我們結合了大型語言模型（LLMs）的強大能力，與強大的文本到圖像生成擴散模型，提出了一種名為StackedDiffusion的簡單方法，它可以根據文本生成這種插圖說明。結果表明，該模型在性能上明顯優於基準方法和最先進的多模態LLMs；在30％的情況下，用戶甚至更喜歡它而不是人工生成的文章。最值得注意的是，它實現了各種新穎且令人興奮的應用，遠遠超出了網頁上的靜態文章所能提供的範疇，例如根據用戶個人情況提供包含中間步驟和圖片的個性化說明。

English

We introduce the new task of generating Illustrated Instructions, i.e., visual instructions customized to a user's needs. We identify desiderata unique to this task, and formalize it through a suite of automatic and human evaluation metrics, designed to measure the validity, consistency, and efficacy of the generations. We combine the power of large language models (LLMs) together with strong text-to-image generation diffusion models to propose a simple approach called StackedDiffusion, which generates such illustrated instructions given text as input. The resulting model strongly outperforms baseline approaches and state-of-the-art multimodal LLMs; and in 30% of cases, users even prefer it to human-generated articles. Most notably, it enables various new and exciting applications far beyond what static articles on the web can provide, such as personalized instructions complete with intermediate steps and pictures in response to a user's individual situation.

生成插圖化指引

Generating Illustrated Instructions

摘要

Support