通过可控图像编辑对车辆检测器实施野外伪装攻击

摘要

深度神经网络在计算机视觉领域取得显著成就，但其对抗攻击的脆弱性依然突出。其中伪装攻击通过改变物体视觉外观来欺骗检测器，同时保持人类难以察觉的特性。本文提出新框架，将车辆伪装攻击建模为条件图像编辑问题：我们探索图像级与场景级伪装生成策略，通过微调ControlNet直接在真实图像上合成伪装车辆。设计统一目标函数，同步保障车辆结构保真度、风格一致性与攻击有效性。在COCO和LINZ数据集上的大量实验表明，本方法攻击效能显著增强（导致AP50下降超38%），同时较现有方法更好保持车辆结构并提升人类视觉隐蔽性。此外，该框架能有效泛化至未见过的黑盒检测器，并展现出良好的物理世界迁移能力。项目页面详见https://humansensinglab.github.io/CtrlCamo

English

Deep neural networks (DNNs) have achieved remarkable success in computer vision but remain highly vulnerable to adversarial attacks. Among them, camouflage attacks manipulate an object's visible appearance to deceive detectors while remaining stealthy to humans. In this paper, we propose a new framework that formulates vehicle camouflage attacks as a conditional image-editing problem. Specifically, we explore both image-level and scene-level camouflage generation strategies, and fine-tune a ControlNet to synthesize camouflaged vehicles directly on real images. We design a unified objective that jointly enforces vehicle structural fidelity, style consistency, and adversarial effectiveness. Extensive experiments on the COCO and LINZ datasets show that our method achieves significantly stronger attack effectiveness, leading to more than 38% AP50 decrease, while better preserving vehicle structure and improving human-perceived stealthiness compared to existing approaches. Furthermore, our framework generalizes effectively to unseen black-box detectors and exhibits promising transferability to the physical world. Project page is available at https://humansensinglab.github.io/CtrlCamo