RAPHAEL:通過大量擴散路徑混合生成文本到圖像
RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
May 29, 2023
作者: Zeyue Xue, Guanglu Song, Qiushan Guo, Boxiao Liu, Zhuofan Zong, Yu Liu, Ping Luo
cs.AI
摘要
最近,文本到圖像生成取得了顯著的成就。我們介紹了一種名為 RAPHAEL 的文本條件圖像擴散模型,用於生成高度藝術性的圖像,準確地描繪文本提示,包括多個名詞、形容詞和動詞。這是通過堆疊數十個專家混合層(MoEs)實現的,即空間-MoE 和時間-MoE 層,從網絡輸入到輸出實現了數十億的擴散路徑(路線)。每條路徑直觀地充當一位“畫家”,在擴散時間步驟上將特定的文本概念描繪到指定的圖像區域。全面的實驗顯示,RAPHAEL 在圖像質量和美學吸引力方面優於最新的尖端模型,如 Stable Diffusion、ERNIE-ViLG 2.0、DeepFloyd 和 DALL-E 2。首先,RAPHAEL 在切換不同風格的圖像方面表現出色,例如日本漫畫、寫實主義、赛博朋克和水墨插畫。其次,一個具有三十億參數的單一模型,在 1,000 個 A100 GPU 上訓練了兩個月,在 COCO 數據集上實現了 6.61 的最先進零樣本 FID 分數。此外,RAPHAEL 在 ViLG-300 基準上的人類評估中明顯超越了其競爭對手。我們相信,RAPHAEL 有潛力推動學術界和工業界圖像生成研究的前沿,為這個快速發展的領域的未來突破鋪平道路。更多詳細信息可在項目網頁上找到:https://raphael-painter.github.io/。
English
Text-to-image generation has recently witnessed remarkable achievements. We
introduce a text-conditional image diffusion model, termed RAPHAEL, to generate
highly artistic images, which accurately portray the text prompts, encompassing
multiple nouns, adjectives, and verbs. This is achieved by stacking tens of
mixture-of-experts (MoEs) layers, i.e., space-MoE and time-MoE layers, enabling
billions of diffusion paths (routes) from the network input to the output. Each
path intuitively functions as a "painter" for depicting a particular textual
concept onto a specified image region at a diffusion timestep. Comprehensive
experiments reveal that RAPHAEL outperforms recent cutting-edge models, such as
Stable Diffusion, ERNIE-ViLG 2.0, DeepFloyd, and DALL-E 2, in terms of both
image quality and aesthetic appeal. Firstly, RAPHAEL exhibits superior
performance in switching images across diverse styles, such as Japanese comics,
realism, cyberpunk, and ink illustration. Secondly, a single model with three
billion parameters, trained on 1,000 A100 GPUs for two months, achieves a
state-of-the-art zero-shot FID score of 6.61 on the COCO dataset. Furthermore,
RAPHAEL significantly surpasses its counterparts in human evaluation on the
ViLG-300 benchmark. We believe that RAPHAEL holds the potential to propel the
frontiers of image generation research in both academia and industry, paving
the way for future breakthroughs in this rapidly evolving field. More details
can be found on a project webpage: https://raphael-painter.github.io/.