深度神经网络中表征几何与泛化性能关系研究
On the Relationship Between Representation Geometry and Generalization in Deep Neural Networks
January 28, 2026
作者: Sumit Yadav
cs.AI
摘要
我们研究了表征几何与神经网络性能之间的关系。通过对13种架构家族的52个预训练ImageNet模型进行分析,发现有效维度——一种无监督几何度量指标——能强力预测模型精度。在控制模型容量后,输出有效维度达到偏相关系数r=0.75(p<10^(-10)),而总压缩率则呈现负相关(偏r=-0.72)。这一发现在ImageNet和CIFAR-10数据集上具有可复现性,并可推广至自然语言处理领域:有效维度能预测8个编码器模型在SST-2/MNLI任务中的性能,以及15个仅解码器大语言模型在AG News任务中的表现(r=0.69, p=0.004),而模型规模则无此预测能力(r=0.07)。我们确立了双向因果关系:通过噪声干扰降低几何质量会导致精度下降(r=-0.94, p<10^(-9)),而通过主成分分析改善几何结构可在保持95%方差的前提下维持各架构精度(仅下降0.03个百分点)。这种关系具有噪声类型无关性——高斯噪声、均匀噪声、丢弃噪声和椒盐噪声均显示|r|>0.90。这些结果表明,有效维度能提供与领域无关的神经网络性能预测信息和因果信息,且完全无需标签即可计算。
English
We investigate the relationship between representation geometry and neural network performance. Analyzing 52 pretrained ImageNet models across 13 architecture families, we show that effective dimension -- an unsupervised geometric metric -- strongly predicts accuracy. Output effective dimension achieves partial r=0.75 (p < 10^(-10)) after controlling for model capacity, while total compression achieves partial r=-0.72. These findings replicate across ImageNet and CIFAR-10, and generalize to NLP: effective dimension predicts performance for 8 encoder models on SST-2/MNLI and 15 decoder-only LLMs on AG News (r=0.69, p=0.004), while model size does not (r=0.07). We establish bidirectional causality: degrading geometry via noise causes accuracy loss (r=-0.94, p < 10^(-9)), while improving geometry via PCA maintains accuracy across architectures (-0.03pp at 95% variance). This relationship is noise-type agnostic -- Gaussian, Uniform, Dropout, and Salt-and-pepper noise all show |r| > 0.90. These results establish that effective dimension provides domain-agnostic predictive and causal information about neural network performance, computed entirely without labels.