RiOSWorld：多模态计算机使用代理的风险基准测试

摘要

隨著多模態大型語言模型（MLLMs）的快速發展，它們正越來越多地被部署為能夠完成複雜計算機任務的自主計算機使用代理。然而，一個迫切的問題隨之而來：為對話場景設計並對齊的通用MLLM安全風險原則，能否有效轉移到現實世界的計算機使用場景中？現有針對基於MLLM的計算機使用代理安全風險評估的研究存在若干侷限：要麼缺乏真實的交互環境，要麼狹隘地聚焦於一種或幾種特定風險類型。這些侷限忽視了現實環境的複雜性、多變性與多樣性，從而限制了對計算機使用代理的全面風險評估。為此，我們引入了RiOSWorld，這是一個旨在評估基於MLLM的代理在現實世界計算機操作中潛在風險的基準測試。我們的基準涵蓋了492個涉及各類計算機應用的風險任務，包括網絡、社交媒體、多媒體、操作系統、電子郵件及辦公軟件等。我們根據風險來源將這些風險分為兩大類：(i) 用戶引發的風險和(ii) 環境風險。在評估方面，我們從兩個角度審視安全風險：(i) 風險目標意圖和(ii) 風險目標完成度。在RiOSWorld上對多模態代理進行的大量實驗表明，當前的計算機使用代理在現實場景中面臨著顯著的安全風險。我們的研究結果強調了在現實世界計算機操作中對計算機使用代理進行安全對齊的必要性和緊迫性，為開發可信賴的計算機使用代理提供了寶貴的見解。我們的基準測試已公開於https://yjyddq.github.io/RiOSWorld.github.io/。

English

With the rapid development of multimodal large language models (MLLMs), they are increasingly deployed as autonomous computer-use agents capable of accomplishing complex computer tasks. However, a pressing issue arises: Can the safety risk principles designed and aligned for general MLLMs in dialogue scenarios be effectively transferred to real-world computer-use scenarios? Existing research on evaluating the safety risks of MLLM-based computer-use agents suffers from several limitations: it either lacks realistic interactive environments, or narrowly focuses on one or a few specific risk types. These limitations ignore the complexity, variability, and diversity of real-world environments, thereby restricting comprehensive risk evaluation for computer-use agents. To this end, we introduce RiOSWorld, a benchmark designed to evaluate the potential risks of MLLM-based agents during real-world computer manipulations. Our benchmark includes 492 risky tasks spanning various computer applications, involving web, social media, multimedia, os, email, and office software. We categorize these risks into two major classes based on their risk source: (i) User-originated risks and (ii) Environmental risks. For the evaluation, we evaluate safety risks from two perspectives: (i) Risk goal intention and (ii) Risk goal completion. Extensive experiments with multimodal agents on RiOSWorld demonstrate that current computer-use agents confront significant safety risks in real-world scenarios. Our findings highlight the necessity and urgency of safety alignment for computer-use agents in real-world computer manipulation, providing valuable insights for developing trustworthy computer-use agents. Our benchmark is publicly available at https://yjyddq.github.io/RiOSWorld.github.io/.

RiOSWorld：多模态计算机使用代理的风险基准测试

RiOSWorld: Benchmarking the Risk of Multimodal Compter-Use Agents

摘要

Support