自己教師あり表現学習のためのランダムフィールド拡張

要旨

自己教師あり表現学習は、表現にエンコードされる不変性を指定するためにデータ拡張に大きく依存しています。これまでの研究では、多様なデータ拡張を適用することが下流タスクの性能向上に重要であることが示されていますが、拡張技術はまだ十分に探求されていません。本研究では、ガウシアンランダム場に基づく新しい局所変換のファミリーを提案し、自己教師あり表現学習のための画像拡張を生成します。これらの変換は、よく確立されたアフィン変換や色変換（平行移動、回転、色ジッターなど）を一般化し、変換パラメータの値をピクセルごとに変化させることで、拡張の空間を大幅に増やします。パラメータは空間座標の連続関数として扱われ、独立したガウシアンランダム場としてモデル化されます。実験結果は、新しい変換が自己教師あり表現学習に有効であることを示しています。具体的には、ImageNetの下流分類タスクでベースラインよりも1.7%のトップ1精度向上を達成し、分布外データであるiNaturalistの下流分類タスクでは3.6%の向上を達成しました。しかし、新しい変換の柔軟性のため、学習された表現はハイパーパラメータに敏感です。穏やかな変換は表現を改善しますが、強い変換は画像の構造を劣化させる可能性があり、拡張の多様性と強度のバランスを取ることが、学習された表現の汎化性能を向上させるために重要であることが示唆されています。

English

Self-supervised representation learning is heavily dependent on data augmentations to specify the invariances encoded in representations. Previous work has shown that applying diverse data augmentations is crucial to downstream performance, but augmentation techniques remain under-explored. In this work, we propose a new family of local transformations based on Gaussian random fields to generate image augmentations for self-supervised representation learning. These transformations generalize the well-established affine and color transformations (translation, rotation, color jitter, etc.) and greatly increase the space of augmentations by allowing transformation parameter values to vary from pixel to pixel. The parameters are treated as continuous functions of spatial coordinates, and modeled as independent Gaussian random fields. Empirical results show the effectiveness of the new transformations for self-supervised representation learning. Specifically, we achieve a 1.7% top-1 accuracy improvement over baseline on ImageNet downstream classification, and a 3.6% improvement on out-of-distribution iNaturalist downstream classification. However, due to the flexibility of the new transformations, learned representations are sensitive to hyperparameters. While mild transformations improve representations, we observe that strong transformations can degrade the structure of an image, indicating that balancing the diversity and strength of augmentations is important for improving generalization of learned representations.

自己教師あり表現学習のためのランダムフィールド拡張

Random Field Augmentations for Self-Supervised Representation Learning

要旨

Support