AdInject: 광고 전달을 통한 웹 에이전트에 대한 실세계 블랙박스 공격

초록

비전-언어 모델(Vision-Language Model, VLM) 기반 웹 에이전트는 웹사이트와의 인간과 유사한 상호작용을 시뮬레이션함으로써 복잡한 작업을 자동화하는 데 있어 중요한 진전을 이루었습니다. 그러나 이러한 에이전트를 통제되지 않은 웹 환경에 배포할 경우 심각한 보안 취약점이 발생합니다. 기존의 적대적 환경 주입 공격에 대한 연구는 종종 직접적인 HTML 조작, 사용자 의도에 대한 지식, 또는 에이전트 모델 파라미터에 대한 접근과 같은 비현실적인 가정에 의존하며, 이는 실용적인 적용성을 제한합니다. 본 논문에서는 인터넷 광고 전달을 활용하여 웹 에이전트의 환경에 악성 콘텐츠를 주입하는 새로운 실질적인 블랙박스 공격 방법인 AdInject를 제안합니다. AdInject는 이전 연구보다 현실적인 위협 모델을 기반으로 동작하며, 블랙박스 에이전트, 정적 악성 콘텐츠 제약, 그리고 사용자 의도에 대한 특정 지식이 없는 상황을 가정합니다. AdInject는 에이전트를 오류로 유도하여 클릭하도록 설계된 악성 광고 콘텐츠 전략과, 대상 웹사이트의 컨텍스트에서 잠재적인 사용자 의도를 추론하고 이를 광고 콘텐츠에 통합하여 에이전트의 작업에 더 관련성이 높거나 중요한 것으로 보이게 하는 VLM 기반 광고 콘텐츠 최적화 기술을 포함합니다. 이를 통해 공격 효과를 극대화합니다. 실험적 평가 결과, AdInject는 대부분의 시나리오에서 60%를 초과하는 공격 성공률을 보였으며, 특정 경우에는 100%에 가까운 성공률을 달성했습니다. 이는 광범위하게 사용되는 광고 전달이 웹 에이전트에 대한 환경 주입 공격의 강력하고 실질적인 벡터임을 강력하게 입증합니다. 이 연구는 실질적인 환경 조작 채널에서 발생하는 웹 에이전트 보안의 중요한 취약점을 강조하며, 이러한 위협에 대한 견고한 방어 메커니즘 개발의 시급한 필요성을 강조합니다. 우리의 코드는 https://github.com/NicerWang/AdInject에서 확인할 수 있습니다.

English

Vision-Language Model (VLM) based Web Agents represent a significant step towards automating complex tasks by simulating human-like interaction with websites. However, their deployment in uncontrolled web environments introduces significant security vulnerabilities. Existing research on adversarial environmental injection attacks often relies on unrealistic assumptions, such as direct HTML manipulation, knowledge of user intent, or access to agent model parameters, limiting their practical applicability. In this paper, we propose AdInject, a novel and real-world black-box attack method that leverages the internet advertising delivery to inject malicious content into the Web Agent's environment. AdInject operates under a significantly more realistic threat model than prior work, assuming a black-box agent, static malicious content constraints, and no specific knowledge of user intent. AdInject includes strategies for designing malicious ad content aimed at misleading agents into clicking, and a VLM-based ad content optimization technique that infers potential user intents from the target website's context and integrates these intents into the ad content to make it appear more relevant or critical to the agent's task, thus enhancing attack effectiveness. Experimental evaluations demonstrate the effectiveness of AdInject, attack success rates exceeding 60% in most scenarios and approaching 100% in certain cases. This strongly demonstrates that prevalent advertising delivery constitutes a potent and real-world vector for environment injection attacks against Web Agents. This work highlights a critical vulnerability in Web Agent security arising from real-world environment manipulation channels, underscoring the urgent need for developing robust defense mechanisms against such threats. Our code is available at https://github.com/NicerWang/AdInject.

AdInject: 광고 전달을 통한 웹 에이전트에 대한 실세계 블랙박스 공격

AdInject: Real-World Black-Box Attacks on Web Agents via Advertising Delivery

초록

Support