无需数据或优化的最大脑损伤：通过符号位翻转破坏神经网络

摘要

深度神经网络（DNN）仅需翻转少量参数位即可遭受灾难性破坏。我们提出深度神经损伤定位法（DNL），这是一种无需数据且不依赖优化的方法，能够定位关键参数；同时提出增强型单次变体1P-DNL，通过随机输入的一次前向与反向传播来优化参数选择。研究表明，这种脆弱性广泛存在于图像分类、目标检测、实例分割以及推理型大语言模型等多个领域。在ImageNet数据集上，仅翻转ResNet-50的两个符号位即可使分类准确率下降99.8%；在目标检测与实例分割任务中，对Mask R-CNN和YOLOv8-seg模型骨干网络的一到两个符号位翻转，会导致COCO检测与掩码AP值崩溃；在语言建模领域，向不同专家模块注入两个符号位翻转可使Qwen3-30B-A3B-Thinking模型的准确率从78%骤降至0%。研究还表明，选择性保护少量易损符号位能为此类攻击提供有效防御方案。

English

Deep Neural Networks (DNNs) can be catastrophically disrupted by flipping only a handful of parameter bits. We introduce Deep Neural Lesion (DNL), a data-free and optimizationfree method that locates critical parameters, and an enhanced single-pass variant, 1P-DNL, that refines this selection with one forward and backward pass on random inputs. We show that this vulnerability spans multiple domains, including image classification, object detection, instance segmentation, and reasoning large language models. In image classification, flipping just two sign bits in ResNet-50 on ImageNet reduces accuracy by 99.8%. In object detection and instance segmentation, one or two sign flips in the backbone collapse COCO detection and mask AP for Mask R-CNN and YOLOv8-seg models. In language modeling, two sign flips into different experts reduce Qwen3-30B-A3B-Thinking from 78% to 0% accuracy. We also show that selectively protecting a small fraction of vulnerable sign bits provides a practical defense against such attacks.