Agent libOS: 장기 실행 및 능력 제어 LLM 에이전트를 위한 라이브러리 OS에서 영감을 받은 런타임

초록

대규모 언어 모델(LLM) 에이전트는 요청-응답 어시스턴트에서 장기 실행 소프트웨어 행위자로 진화하고 있다. 즉, 모델 호출 간 상태를 유지하고, 하위 작업을 분기하며, 외부 이벤트를 기다리고, 사람의 승인을 요청하며, 도구를 생성하고, 재개 및 감사가 필요한 부수 효과를 수행한다. 본 논문에서는 LLM 에이전트를 위한 라이브러리 OS(libOS)에서 영감을 받은 런타임 기반인 Agent libOS를 제시한다. Agent libOS는 기존 호스트 운영체제 위에서 실행되며, 하드웨어 드라이버, 커널 모드 격리 또는 POSIX 호환 운영체제를 구현하지 않는다. 대신 에이전트를 AgentProcess로 취급한다. AgentProcess는 프로세스 아이덴티티, 부모-자식 계통, 수명 주기 상태, AgentImage에서 파생된 도구 테이블, 정형화된 객체 메모리(Object Memory), 명시적 권한(capabilities), 인간 큐, 체크포인트, 이벤트 및 감사 기록을 갖는 스케줄링 가능한 실행 주체이다. 핵심 설계 원칙은 도구가 libc와 유사한 래퍼 역할을 하며, 런타임 프리미티브가 권한 경계(authority boundary)라는 점이다. 파일시스템 접근, 객체 접근, Sleep, 사람의 승인, JIT 도구 등록, 외부 부수 효과는 명시적 권한과 정책 하에서 프리미티브 경계에서 검사된다. 본 논문에서는 설계, 위협 모델, Python 프로토타입, 안전성 중심 평가를 설명한다. 현재 프로토타입은 비동기 스케줄링, 네임스페이스-로컬 객체 메모리, 런타임 통합 사람 승인, 일회성 권한 부여, 프로세스별 작업 디렉터리, 셸 및 이미지 등록 프리미티브, libOS 시스템 호출 중개자 위의 Deno/TypeScript JIT 도구, 파일시스템/객체 브리지 도구, 주입 가능한 리소스 공급자 기반(Resource Provider Substrate), 결정론적 데모, 실제 모델 스모크 스크립트, 그리고 작성 시점 기준 123개의 회귀 테스트를 구현한다. Agent libOS는 계획자의 정확성을 개선하기보다는, 도구 디스패치를 신뢰 경계로 삼지 않고 장기 실행 LLM 에이전트를 스케줄링, 승인, 재개 및 감사할 수 있는 런타임 기반을 보여준다.

English

Large language model (LLM) agents are evolving from request-response assistants into long-running software actors: they maintain state across model calls, fork subtasks, wait for external events, request human authority, generate tools, and perform side effects that must be resumed and audited. This paper presents Agent libOS, a library-OS-inspired runtime substrate for LLM agents. Agent libOS runs above a conventional host operating system; it does not implement hardware drivers, kernel-mode isolation, or a POSIX-compatible operating system. Instead, it treats an agent as an AgentProcess: a schedulable execution subject with process identity, parent-child lineage, lifecycle state, a tool table derived from an AgentImage, typed Object Memory, explicit capabilities, human queues, checkpoints, events, and audit records. Its central design rule is tools are libc-like wrappers; runtime primitives are the authority boundary. Filesystem access, object access, sleeps, human approval, JIT tool registration, and external side effects are checked at primitive boundaries under explicit capabilities and policy. We describe the design, threat model, Python prototype, and safety-oriented evaluation. The current prototype implements async scheduling, namespace-local Object Memory, runtime-integrated human approval, one-shot permission grants, per-process working directories, shell and image-registration primitives, Deno/TypeScript JIT tools over a libOS syscall broker, filesystem/object bridge tools, an injectable Resource Provider Substrate, deterministic demos, real-model smoke scripts, and 123 regression tests at the time of writing. Rather than improving planner accuracy, Agent libOS demonstrates a runtime substrate in which long-running LLM agents can be scheduled, authorized, resumed, and audited without treating tool dispatch as the trust boundary.