Over de Relatie tussen Representatiegeometrie en Generalisatie in Diepe Neurale Netwerken

Samenvatting

Wij onderzoeken de relatie tussen representatiegeometrie en de prestaties van neurale netwerken. Door analyse van 52 vooraf getrainde ImageNet-modellen uit 13 architectuurfamilies tonen wij aan dat effectieve dimensie – een ongecontroleerde geometrische maatstaf – de nauwkeurigheid sterk voorspelt. Effectieve dimensie van de output bereikt een partiële r=0,75 (p < 10^(-10)) na correctie voor modelcapaciteit, terwijl totale compressie een partiële r=-0,72 bereikt. Deze bevindingen repliceren op ImageNet en CIFAR-10, en generaliseren naar NLP: effectieve dimensie voorspelt de prestaties voor 8 encodermodellen op SST-2/MNLI en 15 uitsluitend-decoder LLM's op AG News (r=0,69, p=0,004), terwijl modelgrootte dat niet doet (r=0,07). Wij stellen bidirectionele causaliteit vast: verslechtering van de geometrie door ruis veroorzaakt nauwkeurigheidsverlies (r=-0,94, p < 10^(-9)), terwijl verbetering van de geometrie via PCA de nauwkeurigheid handhaaft over verschillende architecturen (-0,03 procentpunt bij 95% variantie). Deze relatie is onafhankelijk van het ruistype – Gaussische, uniforme, dropout- en zout-en-peperruis vertonen allemaal |r| > 0,90. Deze resultaten bevestigen dat effectieve dimensie domeinonafhankelijke voorspellende en causale informatie verschaft over de prestaties van neurale netwerken, geheel zonder labels berekend.

English

We investigate the relationship between representation geometry and neural network performance. Analyzing 52 pretrained ImageNet models across 13 architecture families, we show that effective dimension -- an unsupervised geometric metric -- strongly predicts accuracy. Output effective dimension achieves partial r=0.75 (p < 10^(-10)) after controlling for model capacity, while total compression achieves partial r=-0.72. These findings replicate across ImageNet and CIFAR-10, and generalize to NLP: effective dimension predicts performance for 8 encoder models on SST-2/MNLI and 15 decoder-only LLMs on AG News (r=0.69, p=0.004), while model size does not (r=0.07). We establish bidirectional causality: degrading geometry via noise causes accuracy loss (r=-0.94, p < 10^(-9)), while improving geometry via PCA maintains accuracy across architectures (-0.03pp at 95% variance). This relationship is noise-type agnostic -- Gaussian, Uniform, Dropout, and Salt-and-pepper noise all show |r| > 0.90. These results establish that effective dimension provides domain-agnostic predictive and causal information about neural network performance, computed entirely without labels.

Over de Relatie tussen Representatiegeometrie en Generalisatie in Diepe Neurale Netwerken

On the Relationship Between Representation Geometry and Generalization in Deep Neural Networks

Samenvatting

Support