MC-JEPA:一種用於自監督學習運動和內容特徵的聯合嵌入預測架構。
MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised Learning of Motion and Content Features
July 24, 2023
作者: Adrien Bardes, Jean Ponce, Yann LeCun
cs.AI
摘要
自我監督學習視覺表示一直專注於學習內容特徵,但未捕捉物體運動或位置,並專注於識別和區分圖像和視頻中的物體。另一方面,光流估計是一項任務,不涉及對其估計的圖像內容的理解。我們統一了這兩種方法,並引入了MC-JEPA,一種聯合嵌入預測架構和自我監督學習方法,共同學習光流和內容特徵在共享編碼器內,展示了兩個相關目標;光流估計目標和自我監督學習目標;互相受益,因此學習包含運動信息的內容特徵。所提出的方法在無監督光流基準和常見的自我監督學習方法等下游任務上實現了與現有方法相當的性能,如圖像和視頻的語義分割。
English
Self-supervised learning of visual representations has been focusing on
learning content features, which do not capture object motion or location, and
focus on identifying and differentiating objects in images and videos. On the
other hand, optical flow estimation is a task that does not involve
understanding the content of the images on which it is estimated. We unify the
two approaches and introduce MC-JEPA, a joint-embedding predictive architecture
and self-supervised learning approach to jointly learn optical flow and content
features within a shared encoder, demonstrating that the two associated
objectives; the optical flow estimation objective and the self-supervised
learning objective; benefit from each other and thus learn content features
that incorporate motion information. The proposed approach achieves performance
on-par with existing unsupervised optical flow benchmarks, as well as with
common self-supervised learning approaches on downstream tasks such as semantic
segmentation of images and videos.