MC-JEPA:一种用于自监督学习运动和内容特征的联合嵌入预测架构。
MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised Learning of Motion and Content Features
July 24, 2023
作者: Adrien Bardes, Jean Ponce, Yann LeCun
cs.AI
摘要
自监督学习视觉表示一直专注于学习内容特征,而不涵盖物体运动或位置,并专注于识别和区分图像和视频中的对象。另一方面,光流估计是一项不涉及理解图像内容的任务。我们统一了这两种方法,并引入了MC-JEPA,这是一个联合嵌入预测架构和自监督学习方法,可以在共享编码器内共同学习光流和内容特征,表明这两个相关目标;光流估计目标和自监督学习目标;互相受益,从而学习融合运动信息的内容特征。所提出的方法在无监督光流基准测试中表现出与现有方法相当的性能,以及在语义分割图像和视频等下游任务中与常见的自监督学习方法相媲美。
English
Self-supervised learning of visual representations has been focusing on
learning content features, which do not capture object motion or location, and
focus on identifying and differentiating objects in images and videos. On the
other hand, optical flow estimation is a task that does not involve
understanding the content of the images on which it is estimated. We unify the
two approaches and introduce MC-JEPA, a joint-embedding predictive architecture
and self-supervised learning approach to jointly learn optical flow and content
features within a shared encoder, demonstrating that the two associated
objectives; the optical flow estimation objective and the self-supervised
learning objective; benefit from each other and thus learn content features
that incorporate motion information. The proposed approach achieves performance
on-par with existing unsupervised optical flow benchmarks, as well as with
common self-supervised learning approaches on downstream tasks such as semantic
segmentation of images and videos.