Роль энтропии и реконструкции в многовидовом самообучении

Аннотация

Механизмы, лежащие в основе успеха многовидового самообучения (MVSSL), до сих пор не полностью изучены. Контрастивные методы MVSSL исследовались через призму InfoNCE, нижней границы взаимной информации (MI). Однако связь между другими методами MVSSL и MI остается неясной. Мы рассматриваем другую нижнюю границу MI, состоящую из энтропии и реконструкционного члена (ER), и анализируем основные семейства методов MVSSL через эту призму. С помощью этой границы ER мы показываем, что методы, основанные на кластеризации, такие как DeepCluster и SwAV, максимизируют MI. Мы также переосмысливаем механизмы подходов, основанных на дистилляции, таких как BYOL и DINO, демонстрируя, что они явно максимизируют реконструкционный член и неявно способствуют стабильной энтропии, что подтверждается эмпирически. Мы показываем, что замена целей распространенных методов MVSSL на эту границу ER позволяет достичь конкурентоспособных результатов, одновременно делая их более стабильными при обучении с меньшими размерами пакетов или меньшими коэффициентами экспоненциального скользящего среднего (EMA). Репозиторий на Github: https://github.com/apple/ml-entropy-reconstruction.

English

The mechanisms behind the success of multi-view self-supervised learning (MVSSL) are not yet fully understood. Contrastive MVSSL methods have been studied through the lens of InfoNCE, a lower bound of the Mutual Information (MI). However, the relation between other MVSSL methods and MI remains unclear. We consider a different lower bound on the MI consisting of an entropy and a reconstruction term (ER), and analyze the main MVSSL families through its lens. Through this ER bound, we show that clustering-based methods such as DeepCluster and SwAV maximize the MI. We also re-interpret the mechanisms of distillation-based approaches such as BYOL and DINO, showing that they explicitly maximize the reconstruction term and implicitly encourage a stable entropy, and we confirm this empirically. We show that replacing the objectives of common MVSSL methods with this ER bound achieves competitive performance, while making them stable when training with smaller batch sizes or smaller exponential moving average (EMA) coefficients. Github repo: https://github.com/apple/ml-entropy-reconstruction.

Роль энтропии и реконструкции в многовидовом самообучении

The Role of Entropy and Reconstruction in Multi-View Self-Supervised Learning

Аннотация

Support