透過費雪資訊衡量模型穩健性：頻譜界限、理論保證與實用演算法

摘要

深度神经网络的鲁棒性对于安全关键型部署至关重要，然而现有评估方法往往依赖特定攻击方式且缺乏可解释性。本文提出一种基于费雪信息矩阵谱范数的原则性、攻击无关的鲁棒性度量，该度量可量化模型输出分布在输入扰动下的最坏情况敏感度。理论上，我们确立了费雪信息矩阵等于输入雅可比矩阵方差的关系，并推导了VGG、ResNet、DenseNet及Transformer等常见架构的闭式谱界，首次提供了理论上的鲁棒性排序。为实现可扩展评估，我们开发了包括幂迭代和基于哈钦森估计的高效算法，同时支持白盒与黑盒场景。在CIFAR、ImageNet、医学图像等多个数据集及多种架构上的广泛实验表明，我们的度量与对抗脆弱性之间存在强相关性。该框架可作为对抗攻击评估的补充性可解释诊断工具，揭示架构敏感性特征，并指导更鲁棒模型的设计。代码已开源：https://github.com/franz-chang/SRP/。

English

The robustness of deep neural networks is crucial for safety-critical deployments, yet existing evaluation methods are often attack-dependent and lack interpretability. We propose a principled, attack-agnostic robustness metric based on the spectral norm of the Fisher Information Matrix (FIM), which quantifies the worst-case sensitivity of the model's output distribution to input perturbations. Theoretically, we establish that the FIM equals the variance of the input Jacobian and derive closed-form spectral bounds for common architectures, including VGG, ResNet, DenseNet, and Transformer, providing the first theoretical robustness ranking. To enable scalable evaluation, we develop efficient algorithms, including power iteration and Hutchinson-based estimation, that support both white-box and black-box settings. Extensive experiments across multiple datasets, including CIFAR, ImageNet, and medical images, and across multiple architectures show a strong correlation between our metric and adversarial vulnerability. Our framework serves as an interpretable diagnostic tool that complements attack-based evaluations, offering insights into architectural sensitivity and guiding the design of more robust models. Code is available at: https://github.com/franz-chang/SRP/.