Agent libOS: ライブラリOSに着想を得た、長期稼働・能力制御型LLMエージェントのためのランタイム

要旨

大規模言語モデル（LLM）エージェントは、リクエスト応答型アシスタントから長時間稼働するソフトウェアアクターへと進化している。すなわち、モデル呼び出し間での状態保持、サブタスクのフォーク、外部イベントの待機、人間による承認の要求、ツールの生成、そして再開と監査が必要な副作用の実行を行う。本稿では、LLMエージェント向けのライブラリOSに着想を得たランタイム基盤であるAgent libOSを提案する。Agent libOSは従来のホストオペレーティングシステム上で動作し、ハードウェアドライバ、カーネルモードの分離、POSIX互換のオペレーティングシステムを実装しない。代わりに、エージェントをAgentProcessとして扱う。AgentProcessは、プロセスID、親子関係、ライフサイクル状態、AgentImageから派生したツールテーブル、型付きオブジェクトメモリ、明示的なケイパビリティ、人間用キュー、チェックポイント、イベント、監査レコードを備えたスケジュール可能な実行主体である。その中心的な設計規則は、ツールはlibcに類似したラッパーであり、ランタイムプリミティブが権限境界となることである。ファイルシステムアクセス、オブジェクトアクセス、スリープ、人間による承認、JITツール登録、外部副作用は、明示的なケイパビリティとポリシーの下でプリミティブ境界においてチェックされる。本稿では、設計、脅威モデル、Pythonプロトタイプ、安全性重視の評価を記述する。現在のプロトタイプは、非同期スケジューリング、名前空間ローカルのオブジェクトメモリ、ランタイム統合型の人間による承認、1回限りの許可付与、プロセスごとのワーキングディレクトリ、シェルおよびイメージ登録プリミティブ、libOSシステムコールブローカーを介したDeno/TypeScript JITツール、ファイルシステム/オブジェクトブリッジツール、注入可能なリソースプロバイダ基盤、決定論的デモ、実モデルスモークスクリプト、執筆時点で123の回帰テストを実装している。Agent libOSはプランナーの精度を向上させるのではなく、ツールディスパッチを信頼境界とせずに長時間稼働するLLMエージェントをスケジュール、承認、再開、監査できるランタイム基盤を示すものである。

English

Large language model (LLM) agents are evolving from request-response assistants into long-running software actors: they maintain state across model calls, fork subtasks, wait for external events, request human authority, generate tools, and perform side effects that must be resumed and audited. This paper presents Agent libOS, a library-OS-inspired runtime substrate for LLM agents. Agent libOS runs above a conventional host operating system; it does not implement hardware drivers, kernel-mode isolation, or a POSIX-compatible operating system. Instead, it treats an agent as an AgentProcess: a schedulable execution subject with process identity, parent-child lineage, lifecycle state, a tool table derived from an AgentImage, typed Object Memory, explicit capabilities, human queues, checkpoints, events, and audit records. Its central design rule is tools are libc-like wrappers; runtime primitives are the authority boundary. Filesystem access, object access, sleeps, human approval, JIT tool registration, and external side effects are checked at primitive boundaries under explicit capabilities and policy. We describe the design, threat model, Python prototype, and safety-oriented evaluation. The current prototype implements async scheduling, namespace-local Object Memory, runtime-integrated human approval, one-shot permission grants, per-process working directories, shell and image-registration primitives, Deno/TypeScript JIT tools over a libOS syscall broker, filesystem/object bridge tools, an injectable Resource Provider Substrate, deterministic demos, real-model smoke scripts, and 123 regression tests at the time of writing. Rather than improving planner accuracy, Agent libOS demonstrates a runtime substrate in which long-running LLM agents can be scheduled, authorized, resumed, and audited without treating tool dispatch as the trust boundary.