ChatPaper.aiChatPaper

深度神经网络表征几何与泛化能力关系研究

On the Relationship Between Representation Geometry and Generalization in Deep Neural Networks

January 28, 2026
作者: Sumit Yadav
cs.AI

摘要

我們研究了表徵幾何與神經網絡性能之間的關係。通過分析涵蓋13種架構家族的52個預訓練ImageNet模型,我們發現有效維度——一種無監督的幾何度量指標——能強力預測模型準確率。在控制模型容量後,輸出有效維度達到偏相關r=0.75(p < 10^(-10)),而總壓縮度則呈現偏相關r=-0.72。這些發現同時在ImageNet和CIFAR-10數據集上得到驗證,並可推廣至自然語言處理領域:有效維度能預測8個編碼器模型在SST-2/MNLI任務中的性能,以及15個僅解碼器大型語言模型在AG News任務中的表現(r=0.69, p=0.004),而模型規模則無此預測能力(r=0.07)。我們建立了雙向因果關係:通過噪聲干擾降低幾何質量會導致準確率下降(r=-0.94, p < 10^(-9)),而通過主成分分析改善幾何結構可在保持95%方差的情況下使各架構準確率波動僅為-0.03個百分點。這種關係具有噪聲類型無關性——高斯噪聲、均勻噪聲、丟棄噪聲和椒鹽噪聲均顯示|r| > 0.90。這些結果證實,有效維度能提供與領域無關的、關於神經網絡性能的預測性和因果性信息,且完全無需標籤即可計算。
English
We investigate the relationship between representation geometry and neural network performance. Analyzing 52 pretrained ImageNet models across 13 architecture families, we show that effective dimension -- an unsupervised geometric metric -- strongly predicts accuracy. Output effective dimension achieves partial r=0.75 (p < 10^(-10)) after controlling for model capacity, while total compression achieves partial r=-0.72. These findings replicate across ImageNet and CIFAR-10, and generalize to NLP: effective dimension predicts performance for 8 encoder models on SST-2/MNLI and 15 decoder-only LLMs on AG News (r=0.69, p=0.004), while model size does not (r=0.07). We establish bidirectional causality: degrading geometry via noise causes accuracy loss (r=-0.94, p < 10^(-9)), while improving geometry via PCA maintains accuracy across architectures (-0.03pp at 95% variance). This relationship is noise-type agnostic -- Gaussian, Uniform, Dropout, and Salt-and-pepper noise all show |r| > 0.90. These results establish that effective dimension provides domain-agnostic predictive and causal information about neural network performance, computed entirely without labels.
PDF34February 7, 2026