拡散カリキュラム：画像誘導拡散を介した合成から実データへの生成的カリキュラム学習

要旨

実践において、低品質または不足しているデータは、深層ニューラルネットワークのトレーニングにおいて重要な課題を提起してきました。古典的なデータ拡張は非常に異なる新しいデータを提供することができませんが、拡散モデルは、テキストによるガイド付きプロンプトを通じて高品質かつ多様な合成データを生成することで、自己進化するAIを構築する新たな可能性を開いています。ただし、テキストのみのガイダンスでは、合成画像が元の画像に近づきすぎることを制御できず、モデルのパフォーマンスに悪影響を及ぼす分布外データが生じます。この制限を克服するために、私たちは画像ガイダンスを研究し、合成と実画像の間のスペクトルの補間を実現します。より強力な画像ガイダンスにより、生成された画像はトレーニングデータに類似していますが、学習が難しいです。一方、より弱い画像ガイダンスでは、合成画像はモデルにとって容易ですが、元のデータとの分布差が大きくなります。生成されたデータの完全なスペクトルにより、新しい「拡散カリキュラム（DisCL）」を構築することができます。DisCLは、各トレーニング段階で画像合成の画像ガイダンスレベルを調整します。これにより、モデルの難しいサンプルを特定し、学習するための合成画像の最も効果的なガイダンスレベルを評価します。私たちは、DisCLを長尾（LT）分類および低品質データから学習するという2つの難しいタスクに適用します。これは、高品質の低ガイダンス画像に焦点を当て、高ガイダンス画像の学習のウォームアップとして、典型的な特徴を学習します。広範な実験により、iWildCamデータセットにDisCLを適用すると、OODおよびIDのマクロ精度がそれぞれ2.7％と2.1％向上します。ImageNet-LTでは、DisCLにより、ベースモデルのテールクラスの精度が4.4％から23.64％に向上し、全クラスの精度が4.02％改善されます。

English

Low-quality or scarce data has posed significant challenges for training deep neural networks in practice. While classical data augmentation cannot contribute very different new data, diffusion models opens up a new door to build self-evolving AI by generating high-quality and diverse synthetic data through text-guided prompts. However, text-only guidance cannot control synthetic images' proximity to the original images, resulting in out-of-distribution data detrimental to the model performance. To overcome the limitation, we study image guidance to achieve a spectrum of interpolations between synthetic and real images. With stronger image guidance, the generated images are similar to the training data but hard to learn. While with weaker image guidance, the synthetic images will be easier for model but contribute to a larger distribution gap with the original data. The generated full spectrum of data enables us to build a novel "Diffusion Curriculum (DisCL)". DisCL adjusts the image guidance level of image synthesis for each training stage: It identifies and focuses on hard samples for the model and assesses the most effective guidance level of synthetic images to improve hard data learning. We apply DisCL to two challenging tasks: long-tail (LT) classification and learning from low-quality data. It focuses on lower-guidance images of high-quality to learn prototypical features as a warm-up of learning higher-guidance images that might be weak on diversity or quality. Extensive experiments showcase a gain of 2.7% and 2.1% in OOD and ID macro-accuracy when applying DisCL to iWildCam dataset. On ImageNet-LT, DisCL improves the base model's tail-class accuracy from 4.4% to 23.64% and leads to a 4.02% improvement in all-class accuracy.

拡散カリキュラム：画像誘導拡散を介した合成から実データへの生成的カリキュラム学習

Diffusion Curriculum: Synthetic-to-Real Generative Curriculum Learning via Image-Guided Diffusion

要旨

Support