Progent: 대규모 언어 모델 에이전트를 위한 프로그래밍 가능한 권한 제어

초록

LLM 에이전트는 대형 언어 모델(LLM)을 핵심 구성 요소로 활용하여 다양한 도구를 사용해 사용자 할당 작업을 완료하는 새로운 형태의 AI 시스템입니다. 이러한 시스템은 큰 잠재력을 가지고 있지만, 상당한 보안 위험도 내포하고 있습니다. 외부 세계와 상호작용할 때 공격자의 악성 명령에 노출되어 위험한 동작을 실행할 가능성이 있습니다. 이를 해결하기 위한 유망한 방법은 최소 권한 원칙을 적용하는 것입니다: 작업 완료에 필수적인 동작만 허용하고 불필요한 동작은 차단하는 것입니다. 그러나 이를 달성하는 것은 어려운 과제입니다. 다양한 에이전트 시나리오를 포괄하면서도 보안과 유용성을 모두 유지해야 하기 때문입니다. 우리는 LLM 에이전트를 위한 최초의 권한 제어 메커니즘인 Progent를 소개합니다. Progent의 핵심은 에이전트 실행 중 적용되는 권한 제어 정책을 유연하게 표현하기 위한 도메인 특화 언어입니다. 이러한 정책은 도구 호출에 대한 세밀한 제약을 제공하여, 도구 호출이 허용되는 시점을 결정하고 허용되지 않을 경우의 대체 방안을 지정합니다. 이를 통해 에이전트 개발자와 사용자는 특정 사용 사례에 적합한 정책을 작성하고 이를 결정론적으로 적용하여 보안을 보장할 수 있습니다. 모듈식 설계 덕분에 Progent를 통합해도 에이전트의 내부 구조는 변경되지 않으며, 에이전트 구현에 최소한의 변경만 필요하여 실용성과 광범위한 채택 가능성이 향상됩니다. 정책 작성을 자동화하기 위해, 우리는 LLM을 활용하여 사용자 쿼리를 기반으로 정책을 생성하고, 이를 동적으로 업데이트하여 보안과 유용성을 개선합니다. 광범위한 평가를 통해 AgentDojo, ASB, AgentPoison이라는 세 가지 독특한 시나리오 또는 벤치마크에서 강력한 보안을 유지하면서도 높은 유용성을 보존할 수 있음을 입증했습니다. 또한, 핵심 구성 요소의 효과와 적응형 공격에 대한 자동화된 정책 생성의 탄력성을 보여주는 심층 분석을 수행했습니다.

English

LLM agents are an emerging form of AI systems where large language models (LLMs) serve as the central component, utilizing a diverse set of tools to complete user-assigned tasks. Despite their great potential, LLM agents pose significant security risks. When interacting with the external world, they may encounter malicious commands from attackers, leading to the execution of dangerous actions. A promising way to address this is by enforcing the principle of least privilege: allowing only essential actions for task completion while blocking unnecessary ones. However, achieving this is challenging, as it requires covering diverse agent scenarios while preserving both security and utility. We introduce Progent, the first privilege control mechanism for LLM agents. At its core is a domain-specific language for flexibly expressing privilege control policies applied during agent execution. These policies provide fine-grained constraints over tool calls, deciding when tool calls are permissible and specifying fallbacks if they are not. This enables agent developers and users to craft suitable policies for their specific use cases and enforce them deterministically to guarantee security. Thanks to its modular design, integrating Progent does not alter agent internals and requires only minimal changes to agent implementation, enhancing its practicality and potential for widespread adoption. To automate policy writing, we leverage LLMs to generate policies based on user queries, which are then updated dynamically for improved security and utility. Our extensive evaluation shows that it enables strong security while preserving high utility across three distinct scenarios or benchmarks: AgentDojo, ASB, and AgentPoison. Furthermore, we perform an in-depth analysis, showcasing the effectiveness of its core components and the resilience of its automated policy generation against adaptive attacks.

Progent: 대규모 언어 모델 에이전트를 위한 프로그래밍 가능한 권한 제어

Progent: Programmable Privilege Control for LLM Agents

초록

Support