O Papel da Entropia e da Reconstrução na Aprendizagem Autossupervisionada Multi-Vista

Resumo

Os mecanismos por trás do sucesso do aprendizado auto-supervisionado multivisão (MVSSL, na sigla em inglês) ainda não são completamente compreendidos. Métodos contrastivos de MVSSL têm sido estudados através da lente do InfoNCE, um limite inferior da Informação Mútua (MI, na sigla em inglês). No entanto, a relação entre outros métodos de MVSSL e a MI permanece obscura. Consideramos um limite inferior diferente da MI, composto por um termo de entropia e um termo de reconstrução (ER, na sigla em inglês), e analisamos as principais famílias de MVSSL através dessa perspectiva. Através desse limite ER, mostramos que métodos baseados em agrupamento, como DeepCluster e SwAV, maximizam a MI. Também reinterpretamos os mecanismos de abordagens baseadas em destilação, como BYOL e DINO, demonstrando que elas maximizam explicitamente o termo de reconstrução e implicitamente incentivam uma entropia estável, o que confirmamos empiricamente. Mostramos que substituir os objetivos de métodos comuns de MVSSL por esse limite ER alcança desempenho competitivo, ao mesmo tempo em que os torna estáveis ao treinar com tamanhos de lote menores ou coeficientes de média móvel exponencial (EMA, na sigla em inglês) menores. Repositório Github: https://github.com/apple/ml-entropy-reconstruction.

English

The mechanisms behind the success of multi-view self-supervised learning (MVSSL) are not yet fully understood. Contrastive MVSSL methods have been studied through the lens of InfoNCE, a lower bound of the Mutual Information (MI). However, the relation between other MVSSL methods and MI remains unclear. We consider a different lower bound on the MI consisting of an entropy and a reconstruction term (ER), and analyze the main MVSSL families through its lens. Through this ER bound, we show that clustering-based methods such as DeepCluster and SwAV maximize the MI. We also re-interpret the mechanisms of distillation-based approaches such as BYOL and DINO, showing that they explicitly maximize the reconstruction term and implicitly encourage a stable entropy, and we confirm this empirically. We show that replacing the objectives of common MVSSL methods with this ER bound achieves competitive performance, while making them stable when training with smaller batch sizes or smaller exponential moving average (EMA) coefficients. Github repo: https://github.com/apple/ml-entropy-reconstruction.

O Papel da Entropia e da Reconstrução na Aprendizagem Autossupervisionada Multi-Vista

The Role of Entropy and Reconstruction in Multi-View Self-Supervised Learning

Resumo

Support