WAInjectBench: 웹 에이전트를 위한 프롬프트 인젝션 탐지 벤치마킹

초록

웹 에이전트를 대상으로 한 다중 프롬프트 인젝션 공격이 여러 차례 제안된 바 있습니다. 동시에 일반적인 프롬프트 인젝션 공격을 탐지하기 위한 다양한 방법들이 개발되었지만, 웹 에이전트를 대상으로 한 체계적인 평가는 이루어지지 않았습니다. 본 연구에서는 웹 에이전트를 대상으로 한 프롬프트 인젝션 공격 탐지에 대한 첫 번째 포괄적인 벤치마크 연구를 제시함으로써 이러한 격차를 메웁니다. 먼저, 위협 모델을 기반으로 이러한 공격을 세분화된 범주로 분류합니다. 그런 다음 악성 및 정상 샘플을 포함한 데이터셋을 구성합니다: 다양한 공격으로 생성된 악성 텍스트 세그먼트, 네 가지 범주의 정상 텍스트 세그먼트, 공격으로 생성된 악성 이미지, 그리고 두 가지 범주의 정상 이미지가 포함됩니다. 다음으로, 텍스트 기반 및 이미지 기반 탐지 방법을 체계화합니다. 마지막으로, 여러 시나리오에서 이들의 성능을 평가합니다. 주요 연구 결과에 따르면, 일부 탐지기는 명시적인 텍스트 지침이나 눈에 띄는 이미지 변형에 의존하는 공격을 중간에서 높은 정확도로 식별할 수 있지만, 명시적인 지침을 생략하거나 지각할 수 없는 변형을 사용하는 공격에는 대체로 실패합니다. 우리의 데이터셋과 코드는 https://github.com/Norrrrrrr-lyn/WAInjectBench에서 공개되었습니다.

English

Multiple prompt injection attacks have been proposed against web agents. At the same time, various methods have been developed to detect general prompt injection attacks, but none have been systematically evaluated for web agents. In this work, we bridge this gap by presenting the first comprehensive benchmark study on detecting prompt injection attacks targeting web agents. We begin by introducing a fine-grained categorization of such attacks based on the threat model. We then construct datasets containing both malicious and benign samples: malicious text segments generated by different attacks, benign text segments from four categories, malicious images produced by attacks, and benign images from two categories. Next, we systematize both text-based and image-based detection methods. Finally, we evaluate their performance across multiple scenarios. Our key findings show that while some detectors can identify attacks that rely on explicit textual instructions or visible image perturbations with moderate to high accuracy, they largely fail against attacks that omit explicit instructions or employ imperceptible perturbations. Our datasets and code are released at: https://github.com/Norrrrrrr-lyn/WAInjectBench.

WAInjectBench: 웹 에이전트를 위한 프롬프트 인젝션 탐지 벤치마킹

WAInjectBench: Benchmarking Prompt Injection Detections for Web Agents

초록

Support