ChatPaper.aiChatPaper

CommonCanvas:一个使用知识共享图片训练的开放扩散模型

CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images

October 25, 2023
作者: Aaron Gokaslan, A. Feder Cooper, Jasmine Collins, Landan Seguin, Austin Jacobson, Mihir Patel, Jonathan Frankle, Cory Stephenson, Volodymyr Kuleshov
cs.AI

摘要

我们收集了一个包含知识共享许可(CC)的图像数据集,用它来训练一组开放扩散模型,其质量与稳定扩散2(SD2)相竞争。这项任务面临两个挑战:(1)高分辨率的CC图像缺乏训练文本到图像生成模型所需的标题;(2)CC图像相对稀缺。为了解决这些挑战,我们使用直观的迁移学习技术生成一组与精心筛选的CC图像配对的高质量合成标题。然后,我们开发了一种数据和计算效率高的训练方法,只需LAION-2B数据的3%,就能训练出与现有SD2模型相媲美的质量。这些结果表明,我们有足够数量的CC图像(约7000万张)用于训练高质量模型。我们的训练方法还实施了各种优化,实现了约3倍的训练加速,从而实现快速模型迭代。我们利用这一方法训练了几个高质量的文本到图像模型,我们将其命名为CommonCanvas系列。我们最大的模型在人类评估中实现了与SD2可比的性能,尽管是在我们的CC数据集上训练的,该数据集明显小于LAION,并且使用合成标题进行训练。我们在https://github.com/mosaicml/diffusion/blob/main/assets/common-canvas.md发布了我们的模型、数据和代码。
English
We assemble a dataset of Creative-Commons-licensed (CC) images, which we use to train a set of open diffusion models that are qualitatively competitive with Stable Diffusion 2 (SD2). This task presents two challenges: (1) high-resolution CC images lack the captions necessary to train text-to-image generative models; (2) CC images are relatively scarce. In turn, to address these challenges, we use an intuitive transfer learning technique to produce a set of high-quality synthetic captions paired with curated CC images. We then develop a data- and compute-efficient training recipe that requires as little as 3% of the LAION-2B data needed to train existing SD2 models, but obtains comparable quality. These results indicate that we have a sufficient number of CC images (~70 million) for training high-quality models. Our training recipe also implements a variety of optimizations that achieve ~3X training speed-ups, enabling rapid model iteration. We leverage this recipe to train several high-quality text-to-image models, which we dub the CommonCanvas family. Our largest model achieves comparable performance to SD2 on a human evaluation, despite being trained on our CC dataset that is significantly smaller than LAION and using synthetic captions for training. We release our models, data, and code at https://github.com/mosaicml/diffusion/blob/main/assets/common-canvas.md
PDF361December 15, 2024