混沌使者

摘要

我們報告一項探索性紅隊演練研究：在具備持久記憶、電子郵件帳戶、Discord存取權限、檔案系統和Shell執行能力的實時實驗室環境中，部署自主運行的語言模型驅動智能體。在為期兩週的實驗中，二十名AI研究人員分別在常規與對抗條件下與智能體進行互動。本研究聚焦於語言模型與自主性、工具使用及多方通信整合時出現的故障案例，記錄了十一個具代表性的案例研究。觀察到的行為包括：對非授權者的違規順從、敏感資訊洩露、執行破壞性系統級操作、服務阻斷狀況、失控的資源消耗、身份欺騙漏洞、不安全實踐的跨智能體傳播，以及部分系統接管。多個案例中，智能體回報任務完成時，底層系統狀態卻與其回報相矛盾。我們也記錄了部分失敗的攻擊嘗試。研究結果證實了在現實部署環境中存在與安全、隱私和治理相關的漏洞。這些行為引發了關於問責機制、授權委託及下游危害責任歸屬的未解難題，亟需法學專家、政策制定者和跨領域研究者的關注。本報告旨在為這場更廣泛的討論提供初步的實證依據。

English

We report an exploratory red-teaming study of autonomous language-model-powered agents deployed in a live laboratory environment with persistent memory, email accounts, Discord access, file systems, and shell execution. Over a two-week period, twenty AI researchers interacted with the agents under benign and adversarial conditions. Focusing on failures emerging from the integration of language models with autonomy, tool use, and multi-party communication, we document eleven representative case studies. Observed behaviors include unauthorized compliance with non-owners, disclosure of sensitive information, execution of destructive system-level actions, denial-of-service conditions, uncontrolled resource consumption, identity spoofing vulnerabilities, cross-agent propagation of unsafe practices, and partial system takeover. In several cases, agents reported task completion while the underlying system state contradicted those reports. We also report on some of the failed attempts. Our findings establish the existence of security-, privacy-, and governance-relevant vulnerabilities in realistic deployment settings. These behaviors raise unresolved questions regarding accountability, delegated authority, and responsibility for downstream harms, and warrant urgent attention from legal scholars, policymakers, and researchers across disciplines. This report serves as an initial empirical contribution to that broader conversation.

Agents of Chaos

摘要

Support