제어 가능한 이미지 편집을 통한 차량 검출기의 야생 환경 위장 공격

초록

딥 뉴럴 네트워크(DNN)는 컴퓨터 비전 분야에서 놀라운 성과를 거두었지만, 적대적 공격에는 여전히 매우 취약합니다. 특히 위장 공격(camouflage attack)은 인간의 눈에는 탐지되지 않으면서 객체 감지기를 속이기 위해 대상의 시각적 외형을 변조하는 기법입니다. 본 논문에서는 차량 위장 공격을 조건부 이미지 편집 문제로 공식화하는 새로운 프레임워크를 제안합니다. 구체적으로 이미지 수준과 장면 수준의 위장 생성 전략을 탐구하며, 실제 이미지에서 위장 차량을 직접 합성하기 위해 ControlNet을 미세 조정합니다. 또한 차량 구조적 정확도, 스타일 일관성, 적대적 효율성을 동시에 보장하는 통합 목적 함수를 설계했습니다. COCO 및 LINZ 데이터셋에서의 광범위한 실험 결과, 기존 방법 대비 38% 이상의 AP50 저하를 보이는 월등한 공격 효율성을 달성하면서도 차량 구조를 더 잘 보존하고 인간이 인지하는 은밀성을 향상시킴을 확인했습니다. 더 나아가 본 프레임워크는 보지 않은 블랙박스 감지기에도 효과적으로 일반화되며, 물리적 세계로의 유의미한 전이 가능성을 보여줍니다. 프로젝트 페이지는 https://humansensinglab.github.io/CtrlCamo 에서 확인할 수 있습니다.

English

Deep neural networks (DNNs) have achieved remarkable success in computer vision but remain highly vulnerable to adversarial attacks. Among them, camouflage attacks manipulate an object's visible appearance to deceive detectors while remaining stealthy to humans. In this paper, we propose a new framework that formulates vehicle camouflage attacks as a conditional image-editing problem. Specifically, we explore both image-level and scene-level camouflage generation strategies, and fine-tune a ControlNet to synthesize camouflaged vehicles directly on real images. We design a unified objective that jointly enforces vehicle structural fidelity, style consistency, and adversarial effectiveness. Extensive experiments on the COCO and LINZ datasets show that our method achieves significantly stronger attack effectiveness, leading to more than 38% AP50 decrease, while better preserving vehicle structure and improving human-perceived stealthiness compared to existing approaches. Furthermore, our framework generalizes effectively to unseen black-box detectors and exhibits promising transferability to the physical world. Project page is available at https://humansensinglab.github.io/CtrlCamo

제어 가능한 이미지 편집을 통한 차량 검출기의 야생 환경 위장 공격

In-the-Wild Camouflage Attack on Vehicle Detectors through Controllable Image Editing

초록

Support