LACUNA：作為遞歸程式孔洞的安全智能體

摘要

大型語言模型代理越來越常透過編寫程式碼來執行任務，但驅動代理的執行環境與模型生成的程式碼之間仍存在著鴻溝。執行環境掌管著迴圈、上下文與控制流程，而模型對此幾乎沒有發言權。讓模型編寫的程式碼能夠形塑執行環境本身，將使代理更具表達力，但也會加劇安全問題。模型可能因提示注入而偏離方向、調用錯誤工具，或在過程中失敗而留下不一致的狀態，而當程式碼能夠形塑執行環境時，這類失敗的影響範圍遠比程式碼僅表達單一動作時更廣。我們提出 LACUNA，一種在保持安全性的同時消除此鴻溝的代理程式設計模型。每個代理動作皆為帶有型別的呼叫 `agent[T](task)`，當執行到達該呼叫時，LLM 會填入程式碼，且該程式碼在執行前會與周圍的程式進行型別檢查。由於每個動作是整體被接受或拒絕，被拒絕的動作不會對環境造成影響，且其編譯器診斷資訊會驅動重試機制。相同的檢查也限制了動作可使用哪些工具與資料，以及它們的流動方式。我們的原始表達方式可將 ReAct 迴圈、子代理、技能、並行分解與多模型規劃視為一般的控制流程。我們在包含 BrowseComp-Plus 與 τ²-bench 的測試案例集上評估 LACUNA。在 BrowseComp-Plus 上，8.6% 的生成在執行前被拒絕，平均每次查詢有 0.7 次重試，代理達到 27.1% 的正確率。在 τ²-bench 上，LACUNA 使用能力強大的模型解決了四個領域中 392 個任務的 76.0%，與基準代理表現相當。

English

LLM agents increasingly act by writing code, yet a split persists between the runtime that drives the agent and the code the model writes. The runtime owns the loop, context, and control flow, and the model has little say over any of them. Letting model-written code shape the runtime itself would make agents more expressive, but it would also sharpen safety problems. A model can be diverted by a prompt injection, call the wrong tool, or fail partway and leave an inconsistent state, and each such failure reaches further when the code shapes the runtime than when it expresses a single action. We present LACUNA, a programming model for agents that closes this split while preserving safety. Each agent action is a typed call agent[T](task) that the LLM fills with code when execution reaches it, and the code is type-checked against the surrounding program before it runs. Because each action is accepted or rejected as a whole, a rejected one leaves the environment untouched, and its compiler diagnostics drive a retry. The same check also bounds which tools and data an action may use and how they flow. Our primitive expresses ReAct loops, sub-agents, skills, parallel decomposition, and multi-model planning as ordinary control flow. We evaluate LACUNA on a collection of test cases, BrowseComp-Plus, and τ^2-bench. On BrowseComp-Plus, 8.6% of generations are rejected before execution, with 0.7 retries per query on average, and the agent reaches 27.1% accuracy. On τ^2-bench, LACUNA solves 76.0% of 392 tasks across four domains with a capable model, on par with the baseline agent.