多視角自監督學習中熵與重建的角色

摘要

多視角自監督學習（MVSSL）成功背後的機制尚未完全被理解。對比MVSSL方法已通過InfoNCE的角度進行研究，該方法是互信息（MI）的一個下界。然而，其他MVSSL方法與MI之間的關係仍不清楚。我們考慮一個不同的互信息下界，包括熵和重建項（ER），並通過它的視角分析主要的MVSSL家族。通過這個ER下界，我們展示了基於聚類的方法，如DeepCluster和SwAV最大化了互信息。我們還重新解釋了基於蒸餾的方法，如BYOL和DINO的機制，顯示它們明確地最大化了重建項，並隱式地鼓勵穩定的熵，我們通過實驗證實了這一點。我們展示了將常見的MVSSL方法的目標替換為這個ER下界可以實現競爭性的性能，同時在使用較小的批次大小或較小的指數移動平均（EMA）係數進行訓練時使它們更穩定。 Github存儲庫：https://github.com/apple/ml-entropy-reconstruction。

English

The mechanisms behind the success of multi-view self-supervised learning (MVSSL) are not yet fully understood. Contrastive MVSSL methods have been studied through the lens of InfoNCE, a lower bound of the Mutual Information (MI). However, the relation between other MVSSL methods and MI remains unclear. We consider a different lower bound on the MI consisting of an entropy and a reconstruction term (ER), and analyze the main MVSSL families through its lens. Through this ER bound, we show that clustering-based methods such as DeepCluster and SwAV maximize the MI. We also re-interpret the mechanisms of distillation-based approaches such as BYOL and DINO, showing that they explicitly maximize the reconstruction term and implicitly encourage a stable entropy, and we confirm this empirically. We show that replacing the objectives of common MVSSL methods with this ER bound achieves competitive performance, while making them stable when training with smaller batch sizes or smaller exponential moving average (EMA) coefficients. Github repo: https://github.com/apple/ml-entropy-reconstruction.

多視角自監督學習中熵與重建的角色

The Role of Entropy and Reconstruction in Multi-View Self-Supervised Learning

摘要

Support