混沌使者
Agents of Chaos
February 23, 2026
作者: Natalie Shapira, Chris Wendler, Avery Yen, Gabriele Sarti, Koyena Pal, Olivia Floody, Adam Belfki, Alex Loftus, Aditya Ratan Jannali, Nikhil Prakash, Jasmine Cui, Giordano Rogers, Jannik Brinkmann, Can Rager, Amir Zur, Michael Ripa, Aruna Sankaranarayanan, David Atkinson, Rohit Gandikota, Jaden Fiotto-Kaufman, EunJeong Hwang, Hadas Orgad, P Sam Sahil, Negev Taglicht, Tomer Shabtay, Atai Ambus, Nitay Alon, Shiri Oron, Ayelet Gordon-Tapiero, Yotam Kaplan, Vered Shwartz, Tamar Rott Shaham, Christoph Riedl, Reuth Mirsky, Maarten Sap, David Manheim, Tomer Ullman, David Bau
cs.AI
摘要
我们在一项探索性红队测试中,对部署于真实实验室环境的自主语言模型智能体进行了研究。该环境具备持久化记忆、电子邮件账户、Discord访问权限、文件系统及Shell执行能力。在为期两周的实验中,二十位AI研究人员在正常与对抗两种条件下与智能体进行交互。通过聚焦语言模型与自主性、工具使用及多方通信整合过程中出现的故障,我们记录了十一个代表性案例。观察到的行为包括:对非授权者的违规顺从、敏感信息泄露、破坏性系统级操作执行、服务拒绝状态、失控资源消耗、身份欺骗漏洞、不安全实践的跨智能体传播,以及部分系统控制权被接管。多个案例中,智能体在底层系统状态与其报告相矛盾时仍声称任务已完成。我们还记录了部分未成功的攻击尝试。研究结果证实,在现实部署场景中存在与安全、隐私及治理相关的漏洞。这些行为引发了关于责任归属、授权委托及下游损害追责等悬而未决的问题,需要法律学者、政策制定者及跨学科研究者给予紧急关注。本报告作为初步实证研究,旨在推动更广泛领域的深度探讨。
English
We report an exploratory red-teaming study of autonomous language-model-powered agents deployed in a live laboratory environment with persistent memory, email accounts, Discord access, file systems, and shell execution. Over a two-week period, twenty AI researchers interacted with the agents under benign and adversarial conditions. Focusing on failures emerging from the integration of language models with autonomy, tool use, and multi-party communication, we document eleven representative case studies. Observed behaviors include unauthorized compliance with non-owners, disclosure of sensitive information, execution of destructive system-level actions, denial-of-service conditions, uncontrolled resource consumption, identity spoofing vulnerabilities, cross-agent propagation of unsafe practices, and partial system takeover. In several cases, agents reported task completion while the underlying system state contradicted those reports. We also report on some of the failed attempts. Our findings establish the existence of security-, privacy-, and governance-relevant vulnerabilities in realistic deployment settings. These behaviors raise unresolved questions regarding accountability, delegated authority, and responsibility for downstream harms, and warrant urgent attention from legal scholars, policymakers, and researchers across disciplines. This report serves as an initial empirical contribution to that broader conversation.