随机场增强用于自监督表示学习

摘要

自监督表示学习在很大程度上依赖数据增强来指定表示中编码的不变性。先前的研究表明，应用多样化的数据增强对下游性能至关重要，但增强技术仍未得到充分探讨。在这项工作中，我们提出了一种基于高斯随机场的局部变换新系列，用于生成自监督表示学习的图像增强。这些变换泛化了广泛应用的仿射和颜色变换（平移、旋转、颜色抖动等），通过允许转换参数值在像素级别变化，大大增加了增强空间。这些参数被视为空间坐标的连续函数，并被建模为独立的高斯随机场。实证结果显示了这些新变换对自监督表示学习的有效性。具体而言，在ImageNet下游分类任务中，我们实现了比基准模型高出1.7%的top-1准确率改进，在外部分布iNaturalist下游分类任务中提高了3.6%。然而，由于新变换的灵活性，学到的表示对超参数敏感。虽然轻微的变换可以改善表示，但我们观察到强烈的变换可能会破坏图像的结构，这表明在提高学到的表示的泛化能力时，平衡增强的多样性和强度是重要的。

English

Self-supervised representation learning is heavily dependent on data augmentations to specify the invariances encoded in representations. Previous work has shown that applying diverse data augmentations is crucial to downstream performance, but augmentation techniques remain under-explored. In this work, we propose a new family of local transformations based on Gaussian random fields to generate image augmentations for self-supervised representation learning. These transformations generalize the well-established affine and color transformations (translation, rotation, color jitter, etc.) and greatly increase the space of augmentations by allowing transformation parameter values to vary from pixel to pixel. The parameters are treated as continuous functions of spatial coordinates, and modeled as independent Gaussian random fields. Empirical results show the effectiveness of the new transformations for self-supervised representation learning. Specifically, we achieve a 1.7% top-1 accuracy improvement over baseline on ImageNet downstream classification, and a 3.6% improvement on out-of-distribution iNaturalist downstream classification. However, due to the flexibility of the new transformations, learned representations are sensitive to hyperparameters. While mild transformations improve representations, we observe that strong transformations can degrade the structure of an image, indicating that balancing the diversity and strength of augmentations is important for improving generalization of learned representations.

随机场增强用于自监督表示学习

Random Field Augmentations for Self-Supervised Representation Learning

摘要

Support