扩散课程：通过图像引导扩散实现从合成到真实的生成式课程学习

摘要

在实践中，低质量或稀缺数据给深度神经网络的训练带来了重大挑战。虽然经典数据增强无法提供非常不同的新数据，扩散模型为通过文本引导提示生成高质量和多样化的合成数据打开了一扇新的大门，从而构建了自我演进的人工智能。然而，仅凭文本指导无法控制合成图像与原始图像的接近程度，导致对模型性能有害的超出分布数据。为了克服这一限制，我们研究了图像指导，以实现合成图像和真实图像之间的一系列插值。通过更强的图像指导，生成的图像与训练数据相似但难以学习。而通过较弱的图像指导，合成图像对模型更容易，但会导致与原始数据之间更大的分布差距。生成的完整数据范围使我们能够构建一种新颖的“扩散课程（DisCL）”。DisCL调整了每个训练阶段的图像合成的图像指导水平：它识别并专注于模型的困难样本，并评估合成图像的最有效指导水平，以改善困难数据的学习。我们将DisCL应用于两项具有挑战性的任务：长尾（LT）分类和从低质量数据中学习。它专注于高质量的低指导图像，以学习原型特征，作为学习可能在多样性或质量上较弱的高指导图像的热身。大量实验证明，将DisCL应用于iWildCam数据集时，OOD和ID宏准确率分别提高了2.7%和2.1%。在ImageNet-LT上，DisCL将基础模型的尾部类别准确率从4.4%提高到23.64%，并使所有类别准确率提高了4.02%。

English

Low-quality or scarce data has posed significant challenges for training deep neural networks in practice. While classical data augmentation cannot contribute very different new data, diffusion models opens up a new door to build self-evolving AI by generating high-quality and diverse synthetic data through text-guided prompts. However, text-only guidance cannot control synthetic images' proximity to the original images, resulting in out-of-distribution data detrimental to the model performance. To overcome the limitation, we study image guidance to achieve a spectrum of interpolations between synthetic and real images. With stronger image guidance, the generated images are similar to the training data but hard to learn. While with weaker image guidance, the synthetic images will be easier for model but contribute to a larger distribution gap with the original data. The generated full spectrum of data enables us to build a novel "Diffusion Curriculum (DisCL)". DisCL adjusts the image guidance level of image synthesis for each training stage: It identifies and focuses on hard samples for the model and assesses the most effective guidance level of synthetic images to improve hard data learning. We apply DisCL to two challenging tasks: long-tail (LT) classification and learning from low-quality data. It focuses on lower-guidance images of high-quality to learn prototypical features as a warm-up of learning higher-guidance images that might be weak on diversity or quality. Extensive experiments showcase a gain of 2.7% and 2.1% in OOD and ID macro-accuracy when applying DisCL to iWildCam dataset. On ImageNet-LT, DisCL improves the base model's tail-class accuracy from 4.4% to 23.64% and leads to a 4.02% improvement in all-class accuracy.

扩散课程：通过图像引导扩散实现从合成到真实的生成式课程学习

Diffusion Curriculum: Synthetic-to-Real Generative Curriculum Learning via Image-Guided Diffusion

摘要

Support