沙盒中的大型語言模型激發通用智能代理能力

摘要

我們提出「沙盒中的大語言模型」（LLM-in-Sandbox）方法，使大語言模型能在程式碼沙盒（即虛擬電腦環境）中進行探索，從而激發其在非程式碼領域的通用智慧。我們首先證明，無需額外訓練的強效大語言模型即具備泛化能力，可將程式碼沙盒應用於非程式碼任務。例如，大語言模型能自主存取外部資源以獲取新知識、利用檔案系統處理長上下文，並執行腳本以滿足格式要求。我們進一步展示，透過「沙盒中的大語言模型強化學習」（LLM-in-Sandbox-RL），僅使用非代理資料即可訓練模型進行沙盒探索，從而增強這些代理能力。實驗表明，無論是免訓練設定還是訓練後設定，LLM-in-Sandbox 在數學、物理、化學、生物醫學、長上下文理解及指令遵循等領域均展現出穩健的泛化能力。最後，我們從計算與系統角度分析 LLM-in-Sandbox 的效率，並將其開源為 Python 套件以促進實際部署。

English

We introduce LLM-in-Sandbox, enabling LLMs to explore within a code sandbox (i.e., a virtual computer), to elicit general intelligence in non-code domains. We first demonstrate that strong LLMs, without additional training, exhibit generalization capabilities to leverage the code sandbox for non-code tasks. For example, LLMs spontaneously access external resources to acquire new knowledge, leverage the file system to handle long contexts, and execute scripts to satisfy formatting requirements. We further show that these agentic capabilities can be enhanced through LLM-in-Sandbox Reinforcement Learning (LLM-in-Sandbox-RL), which uses only non-agentic data to train models for sandbox exploration. Experiments demonstrate that LLM-in-Sandbox, in both training-free and post-trained settings, achieves robust generalization spanning mathematics, physics, chemistry, biomedicine, long-context understanding, and instruction following. Finally, we analyze LLM-in-Sandbox's efficiency from computational and system perspectives, and open-source it as a Python package to facilitate real-world deployment.

沙盒中的大型語言模型激發通用智能代理能力

LLM-in-Sandbox Elicits General Agentic Intelligence

摘要

Support