深度神经网络中表征几何与泛化性能关系研究

摘要

我们研究了表征几何与神经网络性能之间的关系。通过对13种架构家族的52个预训练ImageNet模型进行分析，发现有效维度——一种无监督几何度量指标——能强力预测模型精度。在控制模型容量后，输出有效维度达到偏相关系数r=0.75（p<10^(-10)），而总压缩率则呈现负相关（偏r=-0.72）。这一发现在ImageNet和CIFAR-10数据集上具有可复现性，并可推广至自然语言处理领域：有效维度能预测8个编码器模型在SST-2/MNLI任务中的性能，以及15个仅解码器大语言模型在AG News任务中的表现（r=0.69, p=0.004），而模型规模则无此预测能力（r=0.07）。我们确立了双向因果关系：通过噪声干扰降低几何质量会导致精度下降（r=-0.94, p<10^(-9)），而通过主成分分析改善几何结构可在保持95%方差的前提下维持各架构精度（仅下降0.03个百分点）。这种关系具有噪声类型无关性——高斯噪声、均匀噪声、丢弃噪声和椒盐噪声均显示|r|>0.90。这些结果表明，有效维度能提供与领域无关的神经网络性能预测信息和因果信息，且完全无需标签即可计算。

English

We investigate the relationship between representation geometry and neural network performance. Analyzing 52 pretrained ImageNet models across 13 architecture families, we show that effective dimension -- an unsupervised geometric metric -- strongly predicts accuracy. Output effective dimension achieves partial r=0.75 (p < 10^(-10)) after controlling for model capacity, while total compression achieves partial r=-0.72. These findings replicate across ImageNet and CIFAR-10, and generalize to NLP: effective dimension predicts performance for 8 encoder models on SST-2/MNLI and 15 decoder-only LLMs on AG News (r=0.69, p=0.004), while model size does not (r=0.07). We establish bidirectional causality: degrading geometry via noise causes accuracy loss (r=-0.94, p < 10^(-9)), while improving geometry via PCA maintains accuracy across architectures (-0.03pp at 95% variance). This relationship is noise-type agnostic -- Gaussian, Uniform, Dropout, and Salt-and-pepper noise all show |r| > 0.90. These results establish that effective dimension provides domain-agnostic predictive and causal information about neural network performance, computed entirely without labels.

深度神经网络中表征几何与泛化性能关系研究

On the Relationship Between Representation Geometry and Generalization in Deep Neural Networks

摘要

Support