区域相遇遮罩自编码器:R-MAE
R-MAE: Regions Meet Masked Autoencoders
June 8, 2023
作者: Duy-Kien Nguyen, Vaibhav Aggarwal, Yanghao Li, Martin R. Oswald, Alexander Kirillov, Cees G. M. Snoek, Xinlei Chen
cs.AI
摘要
视觉特定概念,如“区域”,在将通用机器学习框架扩展到诸如目标检测之类的任务中发挥了关键作用。鉴于基于区域的检测器在监督学习中取得的成功以及用于对比学习的图像内方法的进展,我们探索了将区域用于重构预训练的可能性。从掩膜自编码(MAE)作为基线和灵感出发,我们提出了一项针对解决图像和区域之间一对多映射的并行预训练任务。由于这些区域可以以无监督方式生成,我们的方法(R-MAE)继承了MAE的广泛适用性,同时更具“区域意识”。在开发R-MAE过程中进行了彻底分析,并最终确定了一种既有效又高效的变体(比MAE多出1.3%的开销)。此外,当推广到各种预训练数据和下游检测和分割基准时,它显示出一致的定量改进。最后,我们提供了大量的定性可视化来增强对R-MAE行为和潜力的理解。代码将在 https://github.com/facebookresearch/r-mae 上提供。
English
Vision-specific concepts such as "region" have played a key role in extending
general machine learning frameworks to tasks like object detection. Given the
success of region-based detectors for supervised learning and the progress of
intra-image methods for contrastive learning, we explore the use of regions for
reconstructive pre-training. Starting from Masked Autoencoding (MAE) both as a
baseline and an inspiration, we propose a parallel pre-text task tailored to
address the one-to-many mapping between images and regions. Since such regions
can be generated in an unsupervised way, our approach (R-MAE) inherits the wide
applicability from MAE, while being more "region-aware". We conduct thorough
analyses during the development of R-MAE, and converge on a variant that is
both effective and efficient (1.3% overhead over MAE). Moreover, it shows
consistent quantitative improvements when generalized to various pre-training
data and downstream detection and segmentation benchmarks. Finally, we provide
extensive qualitative visualizations to enhance the understanding of R-MAE's
behaviour and potential. Code will be made available at
https://github.com/facebookresearch/r-mae.