SAFEFLOW: 신뢰할 수 있고 트랜잭션 기반의 자율 에이전트 시스템을 위한 원칙 기반 프로토콜

초록

대규모 언어 모델(LLM)과 시각-언어 모델(VLM)의 최근 발전은 복잡한 추론과 다중 모드 도구 사용이 가능한 강력한 자율 에이전트를 가능하게 했습니다. 그러나 이러한 능력이 증가함에도 불구하고, 현재의 에이전트 프레임워크는 여전히 취약하며, 안전한 정보 흐름, 신뢰성, 다중 에이전트 조정을 위한 원칙적인 메커니즘이 부족합니다. 본 연구에서는 신뢰할 수 있는 LLM/VLM 기반 에이전트를 구축하기 위한 새로운 프로토콜 수준의 프레임워크인 SAFEFLOW를 소개합니다. SAFEFLOW는 세분화된 정보 흐름 제어(IFC)를 강제하여 에이전트, 도구, 사용자, 환경 간에 교환되는 모든 데이터의 출처, 무결성, 기밀성을 정확하게 추적합니다. LLM 추론이 이러한 보안 라벨을 준수하도록 제약함으로써, SAFEFLOW는 신뢰할 수 없거나 적대적인 입력이 높은 무결성의 결정을 오염시키는 것을 방지합니다. 동시 다중 에이전트 환경에서의 견고성을 보장하기 위해, SAFEFLOW는 트랜잭션 실행, 충돌 해결, 공유 상태에 대한 안전한 스케줄링을 도입하여 에이전트 간의 전역적 일관성을 유지합니다. 또한, SAFEFLOW는 런타임 오류와 정책 위반에 대한 복원력을 더욱 강화하기 위해 사전 기록(write-ahead logging), 롤백, 안전한 캐시 등의 메커니즘을 추가합니다. 성능을 검증하기 위해, 우리는 적대적, 잡음이 있는, 동시 운영 조건에서 에이전트의 신뢰성을 평가하기 위한 포괄적인 벤치마크 스위트인 SAFEFLOWBENCH를 구축했습니다. 광범위한 실험을 통해 SAFEFLOW로 구축된 에이전트가 적대적인 환경에서도 인상적인 작업 성능과 보안 보장을 유지하며, 최신 기술을 크게 능가함을 입증했습니다. SAFEFLOW와 SAFEFLOWBENCH는 원칙적이고 견고하며 안전한 에이전트 생태계의 기반을 마련함으로써, 신뢰할 수 있는 자율성의 최전선을 나아가게 합니다.

English

Recent advances in large language models (LLMs) and vision-language models (VLMs) have enabled powerful autonomous agents capable of complex reasoning and multi-modal tool use. Despite their growing capabilities, today's agent frameworks remain fragile, lacking principled mechanisms for secure information flow, reliability, and multi-agent coordination. In this work, we introduce SAFEFLOW, a new protocol-level framework for building trustworthy LLM/VLM-based agents. SAFEFLOW enforces fine-grained information flow control (IFC), precisely tracking provenance, integrity, and confidentiality of all the data exchanged between agents, tools, users, and environments. By constraining LLM reasoning to respect these security labels, SAFEFLOW prevents untrusted or adversarial inputs from contaminating high-integrity decisions. To ensure robustness in concurrent multi-agent settings, SAFEFLOW introduces transactional execution, conflict resolution, and secure scheduling over shared state, preserving global consistency across agents. We further introduce mechanisms, including write-ahead logging, rollback, and secure caches, that further enhance resilience against runtime errors and policy violations. To validate the performances, we built SAFEFLOWBENCH, a comprehensive benchmark suite designed to evaluate agent reliability under adversarial, noisy, and concurrent operational conditions. Extensive experiments demonstrate that agents built with SAFEFLOW maintain impressive task performance and security guarantees even in hostile environments, substantially outperforming state-of-the-art. Together, SAFEFLOW and SAFEFLOWBENCH lay the groundwork for principled, robust, and secure agent ecosystems, advancing the frontier of reliable autonomy.

SAFEFLOW: 신뢰할 수 있고 트랜잭션 기반의 자율 에이전트 시스템을 위한 원칙 기반 프로토콜

SAFEFLOW: A Principled Protocol for Trustworthy and Transactional Autonomous Agent Systems

초록

Support