CommonCanvas:使用創用CC授權圖像訓練的開放擴散模型
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images
October 25, 2023
作者: Aaron Gokaslan, A. Feder Cooper, Jasmine Collins, Landan Seguin, Austin Jacobson, Mihir Patel, Jonathan Frankle, Cory Stephenson, Volodymyr Kuleshov
cs.AI
摘要
我們收集了一組具有創用CC授權的圖像數據集,用於訓練一組開放擴散模型,其在質量上與穩定擴散2(SD2)相競爭。這個任務面臨兩個挑戰:(1)高分辨率的CC圖像缺乏訓練文本到圖像生成模型所需的標題;(2)CC圖像相對稀缺。為了應對這些挑戰,我們使用直觀的遷移學習技術生成一組與精心挑選的CC圖像配對的高質量合成標題。然後,我們開發了一個數據和計算效率高的訓練配方,僅需使用現有SD2模型訓練所需的LAION-2B數據的3%,但獲得可比較的質量。這些結果表明,我們有足夠數量的CC圖像(約7000萬張)可用於訓練高質量模型。我們的訓練配方還實現了各種優化,實現了約3倍的訓練加速,從而實現快速模型迭代。我們利用這個配方訓練了幾個高質量的文本到圖像模型,我們稱之為CommonCanvas家族。我們最大的模型在人類評估中實現了與SD2相當的性能,儘管是在我們的CC數據集上訓練的,該數據集明顯比LAION小,並且使用合成標題進行訓練。我們在https://github.com/mosaicml/diffusion/blob/main/assets/common-canvas.md上發布了我們的模型、數據和代碼。
English
We assemble a dataset of Creative-Commons-licensed (CC) images, which we use
to train a set of open diffusion models that are qualitatively competitive with
Stable Diffusion 2 (SD2). This task presents two challenges: (1)
high-resolution CC images lack the captions necessary to train text-to-image
generative models; (2) CC images are relatively scarce. In turn, to address
these challenges, we use an intuitive transfer learning technique to produce a
set of high-quality synthetic captions paired with curated CC images. We then
develop a data- and compute-efficient training recipe that requires as little
as 3% of the LAION-2B data needed to train existing SD2 models, but obtains
comparable quality. These results indicate that we have a sufficient number of
CC images (~70 million) for training high-quality models. Our training recipe
also implements a variety of optimizations that achieve ~3X training speed-ups,
enabling rapid model iteration. We leverage this recipe to train several
high-quality text-to-image models, which we dub the CommonCanvas family. Our
largest model achieves comparable performance to SD2 on a human evaluation,
despite being trained on our CC dataset that is significantly smaller than
LAION and using synthetic captions for training. We release our models, data,
and code at
https://github.com/mosaicml/diffusion/blob/main/assets/common-canvas.md