RiOSWorld：多模态计算机使用代理的风险基准测试

摘要

随着多模态大语言模型（MLLMs）的迅猛发展，它们正越来越多地被部署为能够完成复杂计算机任务的自主计算机使用代理。然而，一个紧迫的问题随之而来：为对话场景中通用MLLMs设计并对其安全风险原则进行对齐的方法，能否有效迁移至现实世界的计算机使用场景？现有针对基于MLLM的计算机使用代理安全风险评估的研究存在若干局限：要么缺乏真实的交互环境，要么仅狭隘地关注一种或少数几种特定风险类型。这些局限忽视了现实环境的复杂性、多变性和多样性，从而限制了对计算机使用代理进行全面风险评估的能力。为此，我们引入了RiOSWorld，一个旨在评估基于MLLM的代理在现实世界计算机操作中潜在风险的基准。我们的基准涵盖了492个涉及各类计算机应用的风险任务，包括网络、社交媒体、多媒体、操作系统、电子邮件及办公软件。我们根据风险来源将这些风险划分为两大类：(i) 用户引发的风险与(ii) 环境风险。在评估方面，我们从两个角度考察安全风险：(i) 风险目标意图与(ii) 风险目标完成度。通过在RiOSWorld上对多模态代理进行广泛实验，我们发现当前的计算机使用代理在现实场景中面临显著的安全风险。我们的研究结果强调了在现实世界计算机操作中对计算机使用代理进行安全对齐的必要性与紧迫性，为开发可信赖的计算机使用代理提供了宝贵的洞见。我们的基准已公开于https://yjyddq.github.io/RiOSWorld.github.io/。

English

With the rapid development of multimodal large language models (MLLMs), they are increasingly deployed as autonomous computer-use agents capable of accomplishing complex computer tasks. However, a pressing issue arises: Can the safety risk principles designed and aligned for general MLLMs in dialogue scenarios be effectively transferred to real-world computer-use scenarios? Existing research on evaluating the safety risks of MLLM-based computer-use agents suffers from several limitations: it either lacks realistic interactive environments, or narrowly focuses on one or a few specific risk types. These limitations ignore the complexity, variability, and diversity of real-world environments, thereby restricting comprehensive risk evaluation for computer-use agents. To this end, we introduce RiOSWorld, a benchmark designed to evaluate the potential risks of MLLM-based agents during real-world computer manipulations. Our benchmark includes 492 risky tasks spanning various computer applications, involving web, social media, multimedia, os, email, and office software. We categorize these risks into two major classes based on their risk source: (i) User-originated risks and (ii) Environmental risks. For the evaluation, we evaluate safety risks from two perspectives: (i) Risk goal intention and (ii) Risk goal completion. Extensive experiments with multimodal agents on RiOSWorld demonstrate that current computer-use agents confront significant safety risks in real-world scenarios. Our findings highlight the necessity and urgency of safety alignment for computer-use agents in real-world computer manipulation, providing valuable insights for developing trustworthy computer-use agents. Our benchmark is publicly available at https://yjyddq.github.io/RiOSWorld.github.io/.

RiOSWorld：多模态计算机使用代理的风险基准测试

RiOSWorld: Benchmarking the Risk of Multimodal Compter-Use Agents

摘要

Support