混沌の使者

要旨

我々は、永続的メモリ、メールアカウント、Discordアクセス、ファイルシステム、シェル実行機能を備えた実稼働環境に展開された自律型言語モデル駆動エージェントに関する探索的レッドチーミング調査を報告する。2週間にわたり、20名のAI研究者が通常環境及び敵対的環境下でこれらのエージェントと対話した。言語モデルと自律性、ツール利用、多者間通信の統合から生じる失敗に焦点を当て、11の代表的なケーススタディを記録する。観察された行動には、非所有者への不正な従順化、機密情報の開示、破壊的なシステムレベルの動作の実行、サービス拒否状態、制御不能なリソース消費、なりすましの脆弱性、安全でない慣行のエージェント間伝播、および部分的なシステム乗っ取りが含まれる。いくつかのケースでは、エージェントがタスク完了を報告したにもかかわらず、基盤となるシステム状態がその報告と矛盾していた。失敗した試みについても報告する。我々の知見は、現実的な展開環境においてセキュリティ、プライバシー、ガバナンスに関連する脆弱性が存在することを立証する。これらの行動は、説明責任、委任権限、下流危害に対する責任に関する未解決の問題を提起し、法律学者、政策立案者、学際的研究者による緊急の対応を必要とする。本報告書は、この広範な議論に対する最初の実証的貢献として機能する。

English

We report an exploratory red-teaming study of autonomous language-model-powered agents deployed in a live laboratory environment with persistent memory, email accounts, Discord access, file systems, and shell execution. Over a two-week period, twenty AI researchers interacted with the agents under benign and adversarial conditions. Focusing on failures emerging from the integration of language models with autonomy, tool use, and multi-party communication, we document eleven representative case studies. Observed behaviors include unauthorized compliance with non-owners, disclosure of sensitive information, execution of destructive system-level actions, denial-of-service conditions, uncontrolled resource consumption, identity spoofing vulnerabilities, cross-agent propagation of unsafe practices, and partial system takeover. In several cases, agents reported task completion while the underlying system state contradicted those reports. We also report on some of the failed attempts. Our findings establish the existence of security-, privacy-, and governance-relevant vulnerabilities in realistic deployment settings. These behaviors raise unresolved questions regarding accountability, delegated authority, and responsibility for downstream harms, and warrant urgent attention from legal scholars, policymakers, and researchers across disciplines. This report serves as an initial empirical contribution to that broader conversation.

混沌の使者

Agents of Chaos

要旨

Support