MANI-Pure: 적대적 정제를 위한 크기 적응형 노이즈 주입

초록

확산 모델을 활용한 적대적 정제(Adversarial Purification)는 유망한 방어 전략으로 부상했지만, 기존 방법들은 일반적으로 모든 주파수에 무차별적으로 영향을 미치는 균일한 노이즈 주입에 의존하여 의미 구조를 손상시키고 견고성을 약화시켰습니다. 우리의 실험 연구는 적대적 섭동이 균일하게 분포되어 있지 않음을 보여줍니다: 이들은 주로 고주파수 영역에 집중되어 있으며, 주파수와 공격 유형에 따라 다양한 크기 강도 패턴을 보입니다. 이러한 관찰에 기반하여, 우리는 입력의 크기 스펙트럼을 활용하여 정제 과정을 안내하는 크기 적응형 정제 프레임워크인 MANI-Pure를 소개합니다. MANI-Pure는 균일한 노이즈를 주입하는 대신, 이질적이고 주파수 대상화된 노이즈를 적응적으로 적용하여, 취약한 고주파수 및 낮은 크기 대역에서의 적대적 섭동을 효과적으로 억제하면서 의미적으로 중요한 저주파수 내용을 보존합니다. CIFAR-10과 ImageNet-1K에 대한 광범위한 실험을 통해 MANI-Pure의 효과성을 검증했습니다. 이는 원래 분류기의 정확도 차이를 0.59 이내로 좁히면서 견고한 정확도를 2.15 향상시켰으며, RobustBench 리더보드에서 최고의 견고한 정확도를 달성하여 이전의 최첨단 방법을 능가했습니다.

English

Adversarial purification with diffusion models has emerged as a promising defense strategy, but existing methods typically rely on uniform noise injection, which indiscriminately perturbs all frequencies, corrupting semantic structures and undermining robustness. Our empirical study reveals that adversarial perturbations are not uniformly distributed: they are predominantly concentrated in high-frequency regions, with heterogeneous magnitude intensity patterns that vary across frequencies and attack types. Motivated by this observation, we introduce MANI-Pure, a magnitude-adaptive purification framework that leverages the magnitude spectrum of inputs to guide the purification process. Instead of injecting homogeneous noise, MANI-Pure adaptively applies heterogeneous, frequency-targeted noise, effectively suppressing adversarial perturbations in fragile high-frequency, low-magnitude bands while preserving semantically critical low-frequency content. Extensive experiments on CIFAR-10 and ImageNet-1K validate the effectiveness of MANI-Pure. It narrows the clean accuracy gap to within 0.59 of the original classifier, while boosting robust accuracy by 2.15, and achieves the top-1 robust accuracy on the RobustBench leaderboard, surpassing the previous state-of-the-art method.

MANI-Pure: 적대적 정제를 위한 크기 적응형 노이즈 주입

MANI-Pure: Magnitude-Adaptive Noise Injection for Adversarial Purification

초록

Support