SAFEFLOW：一种确保可信与事务性自主代理系统的原则性协议

摘要

近期，大型语言模型（LLMs）与视觉-语言模型（VLMs）的突破性进展，催生了具备复杂推理能力和多模态工具使用能力的强大自主代理。然而，尽管这些代理的能力日益增强，当前的代理框架仍显脆弱，缺乏确保信息安全流动、可靠性及多代理协调的原则性机制。为此，我们提出了SAFEFLOW，一种构建可信赖LLM/VLM代理的新型协议级框架。SAFEFLOW实施细粒度的信息流控制（IFC），精确追踪代理、工具、用户及环境间交换数据的来源、完整性与保密性。通过限制LLM推理过程以遵循这些安全标签，SAFEFLOW有效防止了不可信或敌对输入污染高完整性决策。为确保多代理并发环境下的鲁棒性，SAFEFLOW引入了事务执行、冲突解决及基于共享状态的安全调度机制，维护了代理间的全局一致性。此外，我们还引入了包括预写日志、回滚和安全缓存等机制，进一步增强了系统对运行时错误及策略违规的抵御能力。为验证性能，我们构建了SAFEFLOWBENCH，一套全面的基准测试套件，旨在评估代理在对抗性、噪声及并发操作条件下的可靠性。大量实验表明，基于SAFEFLOW构建的代理即使在恶劣环境中也能保持卓越的任务执行能力和安全保障，显著超越了现有技术。SAFEFLOW与SAFEFLOWBENCH共同为构建原则性、鲁棒且安全的代理生态系统奠定了基础，推动了可靠自主技术的前沿发展。

English

Recent advances in large language models (LLMs) and vision-language models (VLMs) have enabled powerful autonomous agents capable of complex reasoning and multi-modal tool use. Despite their growing capabilities, today's agent frameworks remain fragile, lacking principled mechanisms for secure information flow, reliability, and multi-agent coordination. In this work, we introduce SAFEFLOW, a new protocol-level framework for building trustworthy LLM/VLM-based agents. SAFEFLOW enforces fine-grained information flow control (IFC), precisely tracking provenance, integrity, and confidentiality of all the data exchanged between agents, tools, users, and environments. By constraining LLM reasoning to respect these security labels, SAFEFLOW prevents untrusted or adversarial inputs from contaminating high-integrity decisions. To ensure robustness in concurrent multi-agent settings, SAFEFLOW introduces transactional execution, conflict resolution, and secure scheduling over shared state, preserving global consistency across agents. We further introduce mechanisms, including write-ahead logging, rollback, and secure caches, that further enhance resilience against runtime errors and policy violations. To validate the performances, we built SAFEFLOWBENCH, a comprehensive benchmark suite designed to evaluate agent reliability under adversarial, noisy, and concurrent operational conditions. Extensive experiments demonstrate that agents built with SAFEFLOW maintain impressive task performance and security guarantees even in hostile environments, substantially outperforming state-of-the-art. Together, SAFEFLOW and SAFEFLOWBENCH lay the groundwork for principled, robust, and secure agent ecosystems, advancing the frontier of reliable autonomy.

SAFEFLOW：一种确保可信与事务性自主代理系统的原则性协议

SAFEFLOW: A Principled Protocol for Trustworthy and Transactional Autonomous Agent Systems

摘要

Support