フィッシャー情報量によるモデルロバスト性の測定: スペクトル境界、理論的保証、および実用的アルゴリズム

要旨

深層ニューラルネットワークのロバスト性は安全性が重要な応用において極めて重要であるが、既存の評価手法は多くの場合攻撃に依存し、解釈可能性に欠ける。本論文では、フィッシャー情報行列（FIM）のスペクトルノルムに基づく、原理的で攻撃に依存しないロバスト性指標を提案する。この指標は、入力摂動に対するモデルの出力分布の最悪時の感度を定量化する。理論的には、FIMが入力ヤコビアンの分散に等しいことを示し、VGG、ResNet、DenseNet、Transformerを含む一般的なアーキテクチャに対して閉形式のスペクトル境界を導出し、初の理論的なロバスト性ランキングを提供する。スケーラブルな評価を実現するため、べき乗法やハッチンソン推定に基づく効率的なアルゴリズムを開発し、ホワイトボックスおよびブラックボックスの両設定をサポートする。CIFAR、ImageNet、医用画像を含む複数のデータセットと複数のアーキテクチャにわたる広範な実験により、本指標と敵対的脆弱性との間に強い相関があることを示す。本フレームワークは、攻撃ベースの評価を補完する解釈可能な診断ツールとして機能し、アーキテクチャの感度に関する洞察を提供し、よりロバストなモデルの設計を導く。コードはhttps://github.com/franz-chang/SRP/で公開されている。

English

The robustness of deep neural networks is crucial for safety-critical deployments, yet existing evaluation methods are often attack-dependent and lack interpretability. We propose a principled, attack-agnostic robustness metric based on the spectral norm of the Fisher Information Matrix (FIM), which quantifies the worst-case sensitivity of the model's output distribution to input perturbations. Theoretically, we establish that the FIM equals the variance of the input Jacobian and derive closed-form spectral bounds for common architectures, including VGG, ResNet, DenseNet, and Transformer, providing the first theoretical robustness ranking. To enable scalable evaluation, we develop efficient algorithms, including power iteration and Hutchinson-based estimation, that support both white-box and black-box settings. Extensive experiments across multiple datasets, including CIFAR, ImageNet, and medical images, and across multiple architectures show a strong correlation between our metric and adversarial vulnerability. Our framework serves as an interpretable diagnostic tool that complements attack-based evaluations, offering insights into architectural sensitivity and guiding the design of more robust models. Code is available at: https://github.com/franz-chang/SRP/.