RiOSWorld: 멀티모달 컴퓨터 사용 에이전트의 위험성 벤치마킹

초록

다중모드 대형 언어 모델(MLLM)의 급속한 발전과 함께, 이들은 복잡한 컴퓨터 작업을 수행할 수 있는 자율적인 컴퓨터 사용 에이전트로 점점 더 많이 배포되고 있다. 그러나 중요한 문제가 대두된다: 대화 시나리오를 위해 설계되고 정렬된 일반 MLLM의 안전 위험 원칙이 실제 컴퓨터 사용 시나리오에 효과적으로 전이될 수 있는가? MLLM 기반 컴퓨터 사용 에이전트의 안전 위험을 평가하는 기존 연구는 몇 가지 한계를 가지고 있다: 현실적인 상호작용 환경이 부족하거나, 하나 또는 소수의 특정 위험 유형에만 초점을 맞추는 경우가 많다. 이러한 한계는 실제 환경의 복잡성, 변동성, 다양성을 무시함으로써 컴퓨터 사용 에이전트에 대한 포괄적인 위험 평가를 제한한다. 이를 위해, 우리는 실제 컴퓨터 조작 중 MLLM 기반 에이전트의 잠재적 위험을 평가하기 위해 RiOSWorld라는 벤치마크를 소개한다. 우리의 벤치마크는 웹, 소셜 미디어, 멀티미디어, 운영체제, 이메일, 오피스 소프트웨어 등 다양한 컴퓨터 애플리케이션에 걸친 492개의 위험 작업을 포함한다. 우리는 이러한 위험을 위험 원천에 따라 두 가지 주요 범주로 분류한다: (i) 사용자 기원 위험과 (ii) 환경 위험. 평가를 위해, 우리는 안전 위험을 두 가지 관점에서 평가한다: (i) 위험 목표 의도와 (ii) 위험 목표 완료. RiOSWorld에서 다중모드 에이전트를 대상으로 한 광범위한 실험은 현재의 컴퓨터 사용 에이전트가 실제 시나리오에서 상당한 안전 위험에 직면하고 있음을 보여준다. 우리의 연구 결과는 실제 컴퓨터 조작에서 컴퓨터 사용 에이전트의 안전 정렬의 필요성과 긴급성을 강조하며, 신뢰할 수 있는 컴퓨터 사용 에이전트 개발을 위한 귀중한 통찰을 제공한다. 우리의 벤치마크는 https://yjyddq.github.io/RiOSWorld.github.io/에서 공개적으로 이용 가능하다.

English

With the rapid development of multimodal large language models (MLLMs), they are increasingly deployed as autonomous computer-use agents capable of accomplishing complex computer tasks. However, a pressing issue arises: Can the safety risk principles designed and aligned for general MLLMs in dialogue scenarios be effectively transferred to real-world computer-use scenarios? Existing research on evaluating the safety risks of MLLM-based computer-use agents suffers from several limitations: it either lacks realistic interactive environments, or narrowly focuses on one or a few specific risk types. These limitations ignore the complexity, variability, and diversity of real-world environments, thereby restricting comprehensive risk evaluation for computer-use agents. To this end, we introduce RiOSWorld, a benchmark designed to evaluate the potential risks of MLLM-based agents during real-world computer manipulations. Our benchmark includes 492 risky tasks spanning various computer applications, involving web, social media, multimedia, os, email, and office software. We categorize these risks into two major classes based on their risk source: (i) User-originated risks and (ii) Environmental risks. For the evaluation, we evaluate safety risks from two perspectives: (i) Risk goal intention and (ii) Risk goal completion. Extensive experiments with multimodal agents on RiOSWorld demonstrate that current computer-use agents confront significant safety risks in real-world scenarios. Our findings highlight the necessity and urgency of safety alignment for computer-use agents in real-world computer manipulation, providing valuable insights for developing trustworthy computer-use agents. Our benchmark is publicly available at https://yjyddq.github.io/RiOSWorld.github.io/.

RiOSWorld: 멀티모달 컴퓨터 사용 에이전트의 위험성 벤치마킹

RiOSWorld: Benchmarking the Risk of Multimodal Compter-Use Agents

초록

Support