基于Fisher信息度量模型鲁棒性：谱界、理论保证与实用算法

摘要

深度神经网络的鲁棒性对于安全关键型部署至关重要，然而现有评估方法往往依赖于特定攻击且缺乏可解释性。我们提出了一种基于Fisher信息矩阵（FIM）谱范数的、原则性的、与攻击无关的鲁棒性度量方法，该方法能够量化模型输出分布对输入扰动的最大敏感性。在理论上，我们证明了FIM等于输入雅可比矩阵的方差，并推导了常见架构（包括VGG、ResNet、DenseNet和Transformer）的闭式谱界，首次提供了理论上的鲁棒性排序。为了实现可扩展的评估，我们开发了高效的算法（包括幂迭代和基于Hutchinson的估计），支持白盒和黑盒两种设置。在CIFAR、ImageNet和医学图像等多个数据集以及多种架构上的大量实验表明，我们的度量与对抗脆弱性之间存在强相关性。该框架作为对抗攻击评估的补充性可解释诊断工具，能够揭示架构敏感性并指导更鲁棒模型的设计。代码开放于：https://github.com/franz-chang/SRP/。

English

The robustness of deep neural networks is crucial for safety-critical deployments, yet existing evaluation methods are often attack-dependent and lack interpretability. We propose a principled, attack-agnostic robustness metric based on the spectral norm of the Fisher Information Matrix (FIM), which quantifies the worst-case sensitivity of the model's output distribution to input perturbations. Theoretically, we establish that the FIM equals the variance of the input Jacobian and derive closed-form spectral bounds for common architectures, including VGG, ResNet, DenseNet, and Transformer, providing the first theoretical robustness ranking. To enable scalable evaluation, we develop efficient algorithms, including power iteration and Hutchinson-based estimation, that support both white-box and black-box settings. Extensive experiments across multiple datasets, including CIFAR, ImageNet, and medical images, and across multiple architectures show a strong correlation between our metric and adversarial vulnerability. Our framework serves as an interpretable diagnostic tool that complements attack-based evaluations, offering insights into architectural sensitivity and guiding the design of more robust models. Code is available at: https://github.com/franz-chang/SRP/.