隨機場增強技術用於自監督表示學習

摘要

自我監督的表示學習在很大程度上依賴於數據擴增，以指定表示中編碼的不變性。先前的研究表明，應用多樣化的數據擴增對下游性能至關重要，但擴增技術仍未得到充分探索。在這項研究中，我們提出了一個基於高斯隨機場的新型本地變換家族，用於生成用於自我監督表示學習的圖像擴增。這些變換概括了廣泛確立的仿射和顏色變換（平移、旋轉、色彩抖動等），通過允許從像素到像素的變換參數值的變化，大大擴展了擴增的空間。這些參數被視為空間坐標的連續函數，並被建模為獨立的高斯隨機場。實證結果顯示了新變換對於自我監督表示學習的有效性。具體而言，在ImageNet下游分類中，我們實現了比基準模型高1.7%的top-1準確度改善，並在分布外的iNaturalist下游分類中實現了3.6%的改善。然而，由於新變換的靈活性，學習到的表示對超參數敏感。儘管輕微的變換可以改善表示，但我們觀察到強烈的變換可能會破壞圖像的結構，這表明平衡擴增的多樣性和強度對於改善學習表示的泛化能力至關重要。

English

Self-supervised representation learning is heavily dependent on data augmentations to specify the invariances encoded in representations. Previous work has shown that applying diverse data augmentations is crucial to downstream performance, but augmentation techniques remain under-explored. In this work, we propose a new family of local transformations based on Gaussian random fields to generate image augmentations for self-supervised representation learning. These transformations generalize the well-established affine and color transformations (translation, rotation, color jitter, etc.) and greatly increase the space of augmentations by allowing transformation parameter values to vary from pixel to pixel. The parameters are treated as continuous functions of spatial coordinates, and modeled as independent Gaussian random fields. Empirical results show the effectiveness of the new transformations for self-supervised representation learning. Specifically, we achieve a 1.7% top-1 accuracy improvement over baseline on ImageNet downstream classification, and a 3.6% improvement on out-of-distribution iNaturalist downstream classification. However, due to the flexibility of the new transformations, learned representations are sensitive to hyperparameters. While mild transformations improve representations, we observe that strong transformations can degrade the structure of an image, indicating that balancing the diversity and strength of augmentations is important for improving generalization of learned representations.

隨機場增強技術用於自監督表示學習

Random Field Augmentations for Self-Supervised Representation Learning

摘要

Support