AgentHijack: 컴퓨터 사용 에이전트의 일반적인 환경 손상에 대한 견고성 벤치마킹

초록

다중 모달 대규모 언어 모델(MLLM)을 기반으로 한 자율적 컴퓨터 사용 에이전트가 복잡한 디지털 워크플로를 완료하는 유능한 어시스턴트로 부상하고 있다. 그러나 실제 실행 환경은 이상적이지 않다: 팝업, 해상도 변경, 경쟁 애플리케이션 등이 에이전트의 인식과 제어를 자주 방해한다. 본 연구에서는 동적 환경에서의 불확실성이 직접적인 적대적 의도 없이 실행 흐름을 방해하는 일반적인 손상 하에서 컴퓨터 사용 에이전트의 강건성을 평가하기 위해 설계된 벤치마크인 AgentHijack을 소개한다. 구체적으로, AgentHijack은 실제적인 불완전 시나리오를 재현하기 위해 9가지 설정 가능한 일반 손상을 도입한다. MLLM 기반 에이전트를 활용하는 다양한 데스크톱 작업을 평가한 결과, 사소한 손상도 상당한 성능 저하를 초래할 수 있음을 발견했으며, 이는 에이전트의 취약성을 강조하고 강건성 평가의 필요성을 부각시킨다. 이후, 향상된 근거 능력을 갖춘 행동 생성기와 행동 요약 및 환경 점검을 담당하는 관찰자를 통합한 프레임워크인 AgentHijack-Agent를 제안한다. 광범위한 실험을 통해 그 효과성을 검증했다. 코드, 환경, 기준 모델 및 데이터는 https://AgentHijack.github.io에서 공개적으로 제공된다.

English

Autonomous computer use agents that powered by multimodal large language models (MLLMs) are emerging as capable assistants for completing complex digital workflows. However, real-world execution environments are far from ideal: pop-ups, resolution changes, and competing applications frequently interfere with agent perception and control. We introduce AgentHijack, a benchmark designed to evaluate the robustness of computer-use agents under common corruptions, where the uncertainties in dynamic environment disrupt the execution flow without direct adversarial intent. Specifically, AgentHijack introduces 9 configurable common corruptions to replicate realistic imperfect scenarios. We evaluate a variety of desktop tasks that utilize MLLM-based agents and discover that even minor instances of corruption can result in substantial performance degradation, which emphasizes the fragility of agents and underscores the necessity of robustness evaluation. Afterward, we propose AgentHijack-Agent, a framework that integrates an action generator with enhanced grounding capabilities and an onlooker responsible for behavior summarization and environment checking. Extensive experiments validate its effectiveness. Our code, environment, baseline models and data are publicly available at: https://AgentHijack.github.io.