多视图自监督学习中熵和重构的作用

摘要

多视角自监督学习（MVSSL）成功背后的机制尚未完全被理解。对比式MVSSL方法已通过InfoNCE的视角进行研究，该方法是互信息（MI）的一个下界。然而，其他MVSSL方法与MI之间的关系仍不清楚。我们考虑MI的另一个下界，包括熵和重构项（ER），并通过这个视角分析主要的MVSSL方法族。通过这个ER下界，我们展示了基于聚类的方法（如DeepCluster和SwAV）最大化了MI。我们还重新解释了基于蒸馏的方法（如BYOL和DINO）的机制，表明它们明确地最大化了重构项并隐式地鼓励稳定的熵，并通过实验证实了这一点。我们展示了用ER下界替换常见MVSSL方法的目标可以实现竞争性能，同时在使用较小批量大小或较小的指数移动平均（EMA）系数进行训练时使它们更加稳定。 Github仓库：https://github.com/apple/ml-entropy-reconstruction.

English

The mechanisms behind the success of multi-view self-supervised learning (MVSSL) are not yet fully understood. Contrastive MVSSL methods have been studied through the lens of InfoNCE, a lower bound of the Mutual Information (MI). However, the relation between other MVSSL methods and MI remains unclear. We consider a different lower bound on the MI consisting of an entropy and a reconstruction term (ER), and analyze the main MVSSL families through its lens. Through this ER bound, we show that clustering-based methods such as DeepCluster and SwAV maximize the MI. We also re-interpret the mechanisms of distillation-based approaches such as BYOL and DINO, showing that they explicitly maximize the reconstruction term and implicitly encourage a stable entropy, and we confirm this empirically. We show that replacing the objectives of common MVSSL methods with this ER bound achieves competitive performance, while making them stable when training with smaller batch sizes or smaller exponential moving average (EMA) coefficients. Github repo: https://github.com/apple/ml-entropy-reconstruction.

多视图自监督学习中熵和重构的作用

The Role of Entropy and Reconstruction in Multi-View Self-Supervised Learning

摘要

Support