Idea2Img：使用GPT-4V(ision)进行迭代自我完善，实现自动图像设计与生成

摘要

我们介绍了“从想法到图像”系统，该系统利用GPT-4V(ision)进行多模态迭代自我完善，实现自动图像设计和生成。人类可以通过迭代探索快速识别不同文本到图像（T2I）模型的特征。这使他们能够高效地将他们的高层生成想法转化为有效的T2I提示，从而产生良好的图像。我们研究了基于大型多模态模型（LMMs）的系统是否能够发展类似的多模态自我完善能力，从而通过自我完善尝试来探索未知模型或环境。Idea2Img循环生成修订的T2I提示以合成草图图像，并提供定向反馈以进行提示修订，均取决于其对所探测的T2I模型特征的记忆。迭代自我完善使Idea2Img比普通T2I模型具有各种优势。值得注意的是，Idea2Img可以处理交错的图像文本序列输入想法，遵循带有设计说明的想法，并生成具有更好语义和视觉质量的图像。用户偏好研究验证了多模态迭代自我完善在自动图像设计和生成中的有效性。

English

We introduce ``Idea to Image,'' a system that enables multimodal iterative self-refinement with GPT-4V(ision) for automatic image design and generation. Humans can quickly identify the characteristics of different text-to-image (T2I) models via iterative explorations. This enables them to efficiently convert their high-level generation ideas into effective T2I prompts that can produce good images. We investigate if systems based on large multimodal models (LMMs) can develop analogous multimodal self-refinement abilities that enable exploring unknown models or environments via self-refining tries. Idea2Img cyclically generates revised T2I prompts to synthesize draft images, and provides directional feedback for prompt revision, both conditioned on its memory of the probed T2I model's characteristics. The iterative self-refinement brings Idea2Img various advantages over vanilla T2I models. Notably, Idea2Img can process input ideas with interleaved image-text sequences, follow ideas with design instructions, and generate images of better semantic and visual qualities. The user preference study validates the efficacy of multimodal iterative self-refinement on automatic image design and generation.

Idea2Img：使用GPT-4V(ision)进行迭代自我完善，实现自动图像设计与生成

Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation

摘要

Support