Idea2Img：使用GPT-4V(ision)進行迭代自我完善，用於自動圖像設計與生成

摘要

我們介紹了一個名為「從構想到圖像」的系統，該系統利用GPT-4V(ision)進行多模式迭代自我完善，用於自動圖像設計和生成。人類可以通過迭代探索快速識別不同文本到圖像(T2I)模型的特徵。這使他們能夠將高層次生成構想有效轉換為能夠生成良好圖像的T2I提示。我們研究基於大型多模式模型(LMMs)的系統是否能夠發展出類似的多模式自我完善能力，從而能夠通過自我完善的嘗試來探索未知模型或環境。Idea2Img週期性生成修訂的T2I提示以合成草稿圖像，並提供了對提示修訂的方向性反饋，均取決於其對被探測的T2I模型特徵的記憶。迭代自我完善使Idea2Img在自動圖像設計和生成方面具有各種優勢。值得注意的是，Idea2Img能夠處理交錯的圖像-文本序列輸入構想，遵循設計指示的構想，並生成具有更好語義和視覺品質的圖像。用戶偏好研究驗證了多模式迭代自我完善對自動圖像設計和生成的有效性。

English

We introduce ``Idea to Image,'' a system that enables multimodal iterative self-refinement with GPT-4V(ision) for automatic image design and generation. Humans can quickly identify the characteristics of different text-to-image (T2I) models via iterative explorations. This enables them to efficiently convert their high-level generation ideas into effective T2I prompts that can produce good images. We investigate if systems based on large multimodal models (LMMs) can develop analogous multimodal self-refinement abilities that enable exploring unknown models or environments via self-refining tries. Idea2Img cyclically generates revised T2I prompts to synthesize draft images, and provides directional feedback for prompt revision, both conditioned on its memory of the probed T2I model's characteristics. The iterative self-refinement brings Idea2Img various advantages over vanilla T2I models. Notably, Idea2Img can process input ideas with interleaved image-text sequences, follow ideas with design instructions, and generate images of better semantic and visual qualities. The user preference study validates the efficacy of multimodal iterative self-refinement on automatic image design and generation.

Idea2Img：使用GPT-4V(ision)進行迭代自我完善，用於自動圖像設計與生成

Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation

摘要

Support