ChatPaper.aiChatPaper

我們能在多大程度上利用ImageNet進行文本到圖像的生成?

How far can we go with ImageNet for Text-to-Image generation?

February 28, 2025
作者: L. Degeorge, A. Ghosh, N. Dufour, D. Picard, V. Kalogeiton
cs.AI

摘要

近期,文本到圖像(T2I)生成模型通過在數十億規模的數據集上訓練,遵循「越大越好」的範式,取得了顯著成果,這一範式強調數據數量而非質量。我們挑戰這一既定範式,證明對小型、精心策劃的數據集進行策略性數據增強,能夠匹配甚至超越基於大規模網絡抓取數據集訓練的模型。僅使用經過精心設計的文本和圖像增強技術擴展的ImageNet數據集,我們在GenEval上相較SD-XL提升了2分,在DPGBench上提升了5分,而僅使用了1/10的參數和1/1000的訓練圖像。我們的結果表明,策略性數據增強而非龐大的數據集,可能為T2I生成提供一條更可持續的發展路徑。
English
Recent text-to-image (T2I) generation models have achieved remarkable results by training on billion-scale datasets, following a `bigger is better' paradigm that prioritizes data quantity over quality. We challenge this established paradigm by demonstrating that strategic data augmentation of small, well-curated datasets can match or outperform models trained on massive web-scraped collections. Using only ImageNet enhanced with well-designed text and image augmentations, we achieve a +2 overall score over SD-XL on GenEval and +5 on DPGBench while using just 1/10th the parameters and 1/1000th the training images. Our results suggest that strategic data augmentation, rather than massive datasets, could offer a more sustainable path forward for T2I generation.

Summary

AI-Generated Summary

PDF262March 3, 2025