SDXL：改进潜在扩散模型以提高高分辨率图像合成

摘要

我们提出了SDXL，这是一个用于文本到图像合成的潜在扩散模型。与之前的稳定扩散版本相比，SDXL利用了一个三倍大的UNet骨干网络：模型参数的增加主要是由于更多的注意力块和更大的交叉注意力上下文，因为SDXL使用了第二个文本编码器。我们设计了多种新颖的条件方案，并在多个长宽比上训练了SDXL。我们还引入了一个细化模型，用于通过一种事后图像到图像技术改进SDXL生成的样本的视觉保真度。我们展示了SDXL相较于之前版本的稳定扩散有着显著改进的性能，并且达到了与黑盒最先进图像生成器竞争的结果。为了促进开放研究精神并促进大型模型训练和评估的透明度，我们提供了代码和模型权重的访问权限，网址为https://github.com/Stability-AI/generative-models。

English

We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder. We design multiple novel conditioning schemes and train SDXL on multiple aspect ratios. We also introduce a refinement model which is used to improve the visual fidelity of samples generated by SDXL using a post-hoc image-to-image technique. We demonstrate that SDXL shows drastically improved performance compared the previous versions of Stable Diffusion and achieves results competitive with those of black-box state-of-the-art image generators. In the spirit of promoting open research and fostering transparency in large model training and evaluation, we provide access to code and model weights at https://github.com/Stability-AI/generative-models

SDXL：改进潜在扩散模型以提高高分辨率图像合成

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

摘要

Support