다중 뷰 자기 지도 학습에서 엔트로피와 재구성의 역할

초록

다중 뷰 자기 지도 학습(MVSSL)의 성공 메커니즘은 아직 완전히 이해되지 않았습니다. 대조적(contrastive) MVSSL 방법들은 상호 정보량(MI)의 하한인 InfoNCE의 관점에서 연구되어 왔습니다. 그러나 다른 MVSSL 방법들과 MI 간의 관계는 여전히 명확하지 않습니다. 본 연구에서는 엔트로피와 재구성 항으로 구성된 MI의 다른 하한(ER)을 고려하고, 이를 통해 주요 MVSSL 패밀리들을 분석합니다. 이 ER 하한을 통해, DeepCluster와 SwAV와 같은 클러스터링 기반 방법들이 MI를 최대화한다는 것을 보여줍니다. 또한 BYOL과 DINO와 같은 증류(distillation) 기반 접근법의 메커니즘을 재해석하여, 이들이 명시적으로 재구성 항을 최대화하고 암묵적으로 안정적인 엔트로피를 장려한다는 것을 보여주며, 이를 실증적으로 확인합니다. 일반적인 MVSSL 방법들의 목적 함수를 이 ER 하한으로 대체하면 경쟁력 있는 성능을 달성하면서도, 더 작은 배치 크기나 더 작은 지수 이동 평균(EMA) 계수로 학습할 때 안정성을 보장할 수 있음을 보여줍니다. Github 저장소: https://github.com/apple/ml-entropy-reconstruction.

English

The mechanisms behind the success of multi-view self-supervised learning (MVSSL) are not yet fully understood. Contrastive MVSSL methods have been studied through the lens of InfoNCE, a lower bound of the Mutual Information (MI). However, the relation between other MVSSL methods and MI remains unclear. We consider a different lower bound on the MI consisting of an entropy and a reconstruction term (ER), and analyze the main MVSSL families through its lens. Through this ER bound, we show that clustering-based methods such as DeepCluster and SwAV maximize the MI. We also re-interpret the mechanisms of distillation-based approaches such as BYOL and DINO, showing that they explicitly maximize the reconstruction term and implicitly encourage a stable entropy, and we confirm this empirically. We show that replacing the objectives of common MVSSL methods with this ER bound achieves competitive performance, while making them stable when training with smaller batch sizes or smaller exponential moving average (EMA) coefficients. Github repo: https://github.com/apple/ml-entropy-reconstruction.

다중 뷰 자기 지도 학습에서 엔트로피와 재구성의 역할

The Role of Entropy and Reconstruction in Multi-View Self-Supervised Learning

초록

Support