SAFEFLOW：信頼性とトランザクション性を備えた自律エージェントシステムのための原則に基づくプロトコル

要旨

大規模言語モデル（LLM）および視覚言語モデル（VLM）の最近の進展により、複雑な推論とマルチモーダルなツール使用が可能な強力な自律エージェントが実現されています。しかし、その能力が向上しているにもかかわらず、現在のエージェントフレームワークは脆弱であり、安全な情報フロー、信頼性、およびマルチエージェント間の調整を実現するための原則的なメカニズムが欠如しています。本研究では、信頼性の高いLLM/VLMベースのエージェントを構築するための新しいプロトコルレベルのフレームワークであるSAFEFLOWを紹介します。SAFEFLOWは、細粒度の情報フロー制御（IFC）を強制し、エージェント、ツール、ユーザー、環境間で交換されるすべてのデータのプロベナンス、完全性、および機密性を正確に追跡します。LLMの推論をこれらのセキュリティラベルに従うように制約することで、SAFEFLOWは信頼できないまたは敵対的な入力が高完全性の決定を汚染するのを防ぎます。並行マルチエージェント環境での堅牢性を確保するために、SAFEFLOWは、共有状態に対するトランザクション実行、競合解決、および安全なスケジューリングを導入し、エージェント間のグローバルな一貫性を維持します。さらに、ライトアヘッドロギング、ロールバック、および安全なキャッシュなどのメカニズムを導入し、ランタイムエラーやポリシー違反に対する耐性をさらに強化します。性能を検証するために、敵対的、ノイズの多い、および並行操作条件下でのエージェントの信頼性を評価するための包括的なベンチマークスイートであるSAFEFLOWBENCHを構築しました。広範な実験により、SAFEFLOWで構築されたエージェントは、敵対的な環境下でも印象的なタスク性能とセキュリティ保証を維持し、最先端の技術を大幅に上回ることが示されました。SAFEFLOWとSAFEFLOWBENCHは、原則的で堅牢かつ安全なエージェントエコシステムの基盤を築き、信頼性の高い自律性のフロンティアを前進させます。

English

Recent advances in large language models (LLMs) and vision-language models (VLMs) have enabled powerful autonomous agents capable of complex reasoning and multi-modal tool use. Despite their growing capabilities, today's agent frameworks remain fragile, lacking principled mechanisms for secure information flow, reliability, and multi-agent coordination. In this work, we introduce SAFEFLOW, a new protocol-level framework for building trustworthy LLM/VLM-based agents. SAFEFLOW enforces fine-grained information flow control (IFC), precisely tracking provenance, integrity, and confidentiality of all the data exchanged between agents, tools, users, and environments. By constraining LLM reasoning to respect these security labels, SAFEFLOW prevents untrusted or adversarial inputs from contaminating high-integrity decisions. To ensure robustness in concurrent multi-agent settings, SAFEFLOW introduces transactional execution, conflict resolution, and secure scheduling over shared state, preserving global consistency across agents. We further introduce mechanisms, including write-ahead logging, rollback, and secure caches, that further enhance resilience against runtime errors and policy violations. To validate the performances, we built SAFEFLOWBENCH, a comprehensive benchmark suite designed to evaluate agent reliability under adversarial, noisy, and concurrent operational conditions. Extensive experiments demonstrate that agents built with SAFEFLOW maintain impressive task performance and security guarantees even in hostile environments, substantially outperforming state-of-the-art. Together, SAFEFLOW and SAFEFLOWBENCH lay the groundwork for principled, robust, and secure agent ecosystems, advancing the frontier of reliable autonomy.

SAFEFLOW：信頼性とトランザクション性を備えた自律エージェントシステムのための原則に基づくプロトコル

SAFEFLOW: A Principled Protocol for Trustworthy and Transactional Autonomous Agent Systems

要旨

Support