Il Ruolo dell'Entropia e della Ricostruzione nell'Apprendimento Auto-Supervisionato Multi-Vista

Abstract

I meccanismi alla base del successo dell'apprendimento auto-supervisionato multi-vista (MVSSL) non sono ancora completamente compresi. I metodi MVSSL contrastivi sono stati studiati attraverso la lente di InfoNCE, un limite inferiore dell'Informazione Mutua (MI). Tuttavia, la relazione tra altri metodi MVSSL e la MI rimane poco chiara. Consideriamo un diverso limite inferiore della MI composto da un termine di entropia e uno di ricostruzione (ER), e analizziamo le principali famiglie MVSSL attraverso questa prospettiva. Attraverso questo limite ER, dimostriamo che i metodi basati sul clustering come DeepCluster e SwAV massimizzano la MI. Reinterpretiamo inoltre i meccanismi degli approcci basati sulla distillazione come BYOL e DINO, mostrando che massimizzano esplicitamente il termine di ricostruzione e incoraggiano implicitamente un'entropia stabile, e confermiamo empiricamente questo risultato. Dimostriamo che sostituire gli obiettivi dei comuni metodi MVSSL con questo limite ER raggiunge prestazioni competitive, rendendoli stabili durante l'addestramento con dimensioni di batch più piccole o coefficienti di media mobile esponenziale (EMA) più ridotti. Repository Github: https://github.com/apple/ml-entropy-reconstruction.

English

The mechanisms behind the success of multi-view self-supervised learning (MVSSL) are not yet fully understood. Contrastive MVSSL methods have been studied through the lens of InfoNCE, a lower bound of the Mutual Information (MI). However, the relation between other MVSSL methods and MI remains unclear. We consider a different lower bound on the MI consisting of an entropy and a reconstruction term (ER), and analyze the main MVSSL families through its lens. Through this ER bound, we show that clustering-based methods such as DeepCluster and SwAV maximize the MI. We also re-interpret the mechanisms of distillation-based approaches such as BYOL and DINO, showing that they explicitly maximize the reconstruction term and implicitly encourage a stable entropy, and we confirm this empirically. We show that replacing the objectives of common MVSSL methods with this ER bound achieves competitive performance, while making them stable when training with smaller batch sizes or smaller exponential moving average (EMA) coefficients. Github repo: https://github.com/apple/ml-entropy-reconstruction.

Il Ruolo dell'Entropia e della Ricostruzione nell'Apprendimento Auto-Supervisionato Multi-Vista

The Role of Entropy and Reconstruction in Multi-View Self-Supervised Learning

Abstract

Support