沙盒中的大语言模型激发通用智能体能力

摘要

我们提出“沙箱内大语言模型”（LLM-in-Sandbox），使大语言模型能够在代码沙箱（即虚拟计算机）中进行探索，以激发其在非代码领域的通用智能。我们首先证明，无需额外训练的强大LLM即可展现泛化能力，利用代码沙箱处理非代码任务。例如，LLM能自主访问外部资源获取新知识，利用文件系统处理长文本语境，并执行脚本来满足格式要求。我们进一步表明，通过沙箱内大语言模型强化学习（LLM-in-Sandbox-RL），可以增强这些智能体能力——该方法仅使用非智能体数据来训练模型进行沙箱探索。实验表明，无论是免训练模式还是后训练模式下的LLM-in-Sandbox，在数学、物理、化学、生物医学、长文本理解及指令遵循等任务中均实现了稳健的泛化能力。最后，我们从计算和系统两个维度分析了LLM-in-Sandbox的效率，并将其开源为Python包以促进实际应用部署。

English

We introduce LLM-in-Sandbox, enabling LLMs to explore within a code sandbox (i.e., a virtual computer), to elicit general intelligence in non-code domains. We first demonstrate that strong LLMs, without additional training, exhibit generalization capabilities to leverage the code sandbox for non-code tasks. For example, LLMs spontaneously access external resources to acquire new knowledge, leverage the file system to handle long contexts, and execute scripts to satisfy formatting requirements. We further show that these agentic capabilities can be enhanced through LLM-in-Sandbox Reinforcement Learning (LLM-in-Sandbox-RL), which uses only non-agentic data to train models for sandbox exploration. Experiments demonstrate that LLM-in-Sandbox, in both training-free and post-trained settings, achieves robust generalization spanning mathematics, physics, chemistry, biomedicine, long-context understanding, and instruction following. Finally, we analyze LLM-in-Sandbox's efficiency from computational and system perspectives, and open-source it as a Python package to facilitate real-world deployment.

沙盒中的大语言模型激发通用智能体能力

LLM-in-Sandbox Elicits General Agentic Intelligence

摘要

Support