サンドボックス内LLMが一般的なエージェント知能を引き出す

要旨

我々は「LLM-in-Sandbox」を提案し、大規模言語モデル（LLM）がコードサンドボックス（仮想コンピュータ環境）内で探索を行うことで、非コード領域における汎用的な知能の発現を可能にします。まず、強力なLLMが追加の訓練なしに、非コードタスクにおいてコードサンドボックスを活用する一般化能力を示すことを実証します。例えば、LLMは自発的に外部リソースにアクセスして新たな知識を獲得し、ファイルシステムを活用して長文コンテキストを処理し、スクリプトを実行してフォーマット要件を満たすことが観察されます。さらに、サンドボックス探索のためのモデル訓練に非エージェント的データのみを使用する「LLM-in-Sandbox強化学習（LLM-in-Sandbox-RL）」を通じて、これらのエージェント能力を強化できることを示します。実験により、訓練不要な設定と事後学習済み設定の両方において、LLM-in-Sandboxが数学・物理学・化学・生物医学・長文理解・指示追従にわたる頑健な一般化を達成することを確認しました。最後に、計算効率とシステム観点からの分析を行い、実環境での展開を促進するためPythonパッケージとしてオープンソース化しました。

English

We introduce LLM-in-Sandbox, enabling LLMs to explore within a code sandbox (i.e., a virtual computer), to elicit general intelligence in non-code domains. We first demonstrate that strong LLMs, without additional training, exhibit generalization capabilities to leverage the code sandbox for non-code tasks. For example, LLMs spontaneously access external resources to acquire new knowledge, leverage the file system to handle long contexts, and execute scripts to satisfy formatting requirements. We further show that these agentic capabilities can be enhanced through LLM-in-Sandbox Reinforcement Learning (LLM-in-Sandbox-RL), which uses only non-agentic data to train models for sandbox exploration. Experiments demonstrate that LLM-in-Sandbox, in both training-free and post-trained settings, achieves robust generalization spanning mathematics, physics, chemistry, biomedicine, long-context understanding, and instruction following. Finally, we analyze LLM-in-Sandbox's efficiency from computational and system perspectives, and open-source it as a Python package to facilitate real-world deployment.

サンドボックス内LLMが一般的なエージェント知能を引き出す

LLM-in-Sandbox Elicits General Agentic Intelligence

要旨

Support