InfoNCE induceert een Gaussische verdeling

Samenvatting

Contrastief leren is een hoeksteen geworden van modern representation learning, waardoor training met enorme hoeveelheden ongelabelde data mogelijk wordt voor zowel taakspecifieke als algemene (foundation) modellen. Een prototypisch verlies bij contrastieve training is InfoNCE en zijn varianten. In dit werk tonen we aan dat het InfoNCE-doel een Gaussische structuur induceert in representaties die voortkomen uit contrastieve training. We leggen dit resultaat vast in twee complementaire regimes. Ten eerste tonen we aan dat onder bepaalde aannames van alignering en concentratie, projecties van de hoogdimensionale representatie asymptotisch een multivariate Gaussische verdeling benaderen. Vervolgens tonen we aan, onder minder strikte aannames, dat het toevoegen van een kleine asymptotisch verdwijnende regularisatieterm die een lage feature-norm en hoge feature-entropie bevordert, tot vergelijkbare asymptotische resultaten leidt. We ondersteunen onze analyse met experimenten op synthetische en CIFAR-10 datasets over meerdere encoder-architecturen en -groottes, waarbij consistente Gaussische gedrag wordt aangetoond. Dit perspectief biedt een principekundige verklaring voor de algemeen waargenomen Gaussianiteit in contrastieve representaties. Het resulterende Gaussische model maakt een principekundige analytische behandeling van geleerde representaties mogelijk en wordt verwacht een breed scala aan toepassingen in contrastief leren te ondersteunen.

English

Contrastive learning has become a cornerstone of modern representation learning, allowing training with massive unlabeled data for both task-specific and general (foundation) models. A prototypical loss in contrastive training is InfoNCE and its variants. In this work, we show that the InfoNCE objective induces Gaussian structure in representations that emerge from contrastive training. We establish this result in two complementary regimes. First, we show that under certain alignment and concentration assumptions, projections of the high-dimensional representation asymptotically approach a multivariate Gaussian distribution. Next, under less strict assumptions, we show that adding a small asymptotically vanishing regularization term that promotes low feature norm and high feature entropy leads to similar asymptotic results. We support our analysis with experiments on synthetic and CIFAR-10 datasets across multiple encoder architectures and sizes, demonstrating consistent Gaussian behavior. This perspective provides a principled explanation for commonly observed Gaussianity in contrastive representations. The resulting Gaussian model enables principled analytical treatment of learned representations and is expected to support a wide range of applications in contrastive learning.

InfoNCE induceert een Gaussische verdeling

InfoNCE Induces Gaussian Distribution

Samenvatting

Support