OS-Sentinel：透過現實工作流程中的混合驗證實現安全強化的行動GUI代理

摘要

基於視覺語言模型驅動的電腦操作代理，已在行動平台等數位環境中展現出類人的操作能力。儘管這類代理在推動數位自動化方面前景廣闊，但其可能引發系統入侵、隱私洩漏等不安全操作的潛在風險正引發重大關注。在行動環境廣闊而複雜的操作空間中檢測這些安全隱患，仍是一項亟待深入探索的重大挑戰。為奠定行動代理安全研究的基礎，我們推出MobileRisk-Live動態沙箱環境，並配套建立包含精細標註真實操作軌跡的安全檢測基準。基於此，我們提出OS-Sentinel新型混合安全檢測框架，該框架通過形式化驗證器檢測顯性系統層級違規，並結合基於VLM的上下文判別器評估情境風險與代理行為，實現協同防護。實驗表明，OS-Sentinel在多項指標上較現有方法提升10%-30%。深入分析更為開發更安全可靠的自動化行動代理提供了關鍵洞見。

English

Computer-using agents powered by Vision-Language Models (VLMs) have demonstrated human-like capabilities in operating digital environments like mobile platforms. While these agents hold great promise for advancing digital automation, their potential for unsafe operations, such as system compromise and privacy leakage, is raising significant concerns. Detecting these safety concerns across the vast and complex operational space of mobile environments presents a formidable challenge that remains critically underexplored. To establish a foundation for mobile agent safety research, we introduce MobileRisk-Live, a dynamic sandbox environment accompanied by a safety detection benchmark comprising realistic trajectories with fine-grained annotations. Built upon this, we propose OS-Sentinel, a novel hybrid safety detection framework that synergistically combines a Formal Verifier for detecting explicit system-level violations with a VLM-based Contextual Judge for assessing contextual risks and agent actions. Experiments show that OS-Sentinel achieves 10%-30% improvements over existing approaches across multiple metrics. Further analysis provides critical insights that foster the development of safer and more reliable autonomous mobile agents.

OS-Sentinel：透過現實工作流程中的混合驗證實現安全強化的行動GUI代理

OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows

摘要

Support