エントロピーと再構成がマルチビュー自己教師あり学習に果たす役割

要旨

マルチビュー自己教師あり学習（MVSSL）の成功の背後にあるメカニズムは、まだ完全には理解されていません。コントラスティブなMVSSL手法は、相互情報量（MI）の下限であるInfoNCEの観点から研究されてきました。しかし、他のMVSSL手法とMIの関係は依然として不明確です。本研究では、エントロピーと再構成項（ER）からなるMIの異なる下限を考察し、主要なMVSSLファミリーをこの観点から分析します。このER下限を通じて、DeepClusterやSwAVなどのクラスタリングベースの手法がMIを最大化することを示します。また、BYOLやDINOなどの蒸留ベースのアプローチのメカニズムを再解釈し、それらが明示的に再構成項を最大化し、暗黙的に安定したエントロピーを促進することを示し、これを実験的に確認します。一般的なMVSSL手法の目的関数をこのER下限に置き換えることで、競争力のある性能を達成しつつ、より小さなバッチサイズやより小さな指数移動平均（EMA）係数でのトレーニングを安定化できることを示します。 Githubリポジトリ: https://github.com/apple/ml-entropy-reconstruction.

English

The mechanisms behind the success of multi-view self-supervised learning (MVSSL) are not yet fully understood. Contrastive MVSSL methods have been studied through the lens of InfoNCE, a lower bound of the Mutual Information (MI). However, the relation between other MVSSL methods and MI remains unclear. We consider a different lower bound on the MI consisting of an entropy and a reconstruction term (ER), and analyze the main MVSSL families through its lens. Through this ER bound, we show that clustering-based methods such as DeepCluster and SwAV maximize the MI. We also re-interpret the mechanisms of distillation-based approaches such as BYOL and DINO, showing that they explicitly maximize the reconstruction term and implicitly encourage a stable entropy, and we confirm this empirically. We show that replacing the objectives of common MVSSL methods with this ER bound achieves competitive performance, while making them stable when training with smaller batch sizes or smaller exponential moving average (EMA) coefficients. Github repo: https://github.com/apple/ml-entropy-reconstruction.

エントロピーと再構成がマルチビュー自己教師あり学習に果たす役割

The Role of Entropy and Reconstruction in Multi-View Self-Supervised Learning

要旨

Support