RAPHAEL:通过大规模扩散路径混合实现文本到图像生成
RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
May 29, 2023
作者: Zeyue Xue, Guanglu Song, Qiushan Guo, Boxiao Liu, Zhuofan Zong, Yu Liu, Ping Luo
cs.AI
摘要
最近,文本到图像生成取得了显著的成就。我们引入了一种名为RAPHAEL的文本条件图像扩散模型,用于生成高度艺术化的图像,准确描绘文本提示,涵盖多个名词、形容词和动词。这是通过堆叠数十个专家混合模型(MoEs)层实现的,即空间MoE和时间MoE层,从网络输入到输出实现了数十亿的扩散路径(路线)。每条路径直观地充当“画家”,在扩散时间步上将特定的文本概念描绘到指定的图像区域。全面的实验显示,RAPHAEL在图像质量和审美吸引力方面优于最近的前沿模型,如稳定扩散、ERNIE-ViLG 2.0、DeepFloyd和DALL-E 2。首先,RAPHAEL在切换各种风格的图像方面表现出色,如日本漫画、写实主义、赛博朋克和水墨插画。其次,一个拥有30亿参数的单一模型,在1000台A100 GPU上训练了两个月,在COCO数据集上实现了6.61的最先进零样本FID分数。此外,RAPHAEL在ViLG-300基准上的人类评估明显超过了其竞争对手。我们相信RAPHAEL有潜力推动学术界和工业界图像生成研究的前沿,为这个快速发展的领域的未来突破铺平道路。更多详细信息请访问项目网页:https://raphael-painter.github.io/。
English
Text-to-image generation has recently witnessed remarkable achievements. We
introduce a text-conditional image diffusion model, termed RAPHAEL, to generate
highly artistic images, which accurately portray the text prompts, encompassing
multiple nouns, adjectives, and verbs. This is achieved by stacking tens of
mixture-of-experts (MoEs) layers, i.e., space-MoE and time-MoE layers, enabling
billions of diffusion paths (routes) from the network input to the output. Each
path intuitively functions as a "painter" for depicting a particular textual
concept onto a specified image region at a diffusion timestep. Comprehensive
experiments reveal that RAPHAEL outperforms recent cutting-edge models, such as
Stable Diffusion, ERNIE-ViLG 2.0, DeepFloyd, and DALL-E 2, in terms of both
image quality and aesthetic appeal. Firstly, RAPHAEL exhibits superior
performance in switching images across diverse styles, such as Japanese comics,
realism, cyberpunk, and ink illustration. Secondly, a single model with three
billion parameters, trained on 1,000 A100 GPUs for two months, achieves a
state-of-the-art zero-shot FID score of 6.61 on the COCO dataset. Furthermore,
RAPHAEL significantly surpasses its counterparts in human evaluation on the
ViLG-300 benchmark. We believe that RAPHAEL holds the potential to propel the
frontiers of image generation research in both academia and industry, paving
the way for future breakthroughs in this rapidly evolving field. More details
can be found on a project webpage: https://raphael-painter.github.io/.