LLM-in-Sandbox evoziert allgemeine agentische Intelligenz

papers.abstract

Wir stellen LLM-in-Sandbox vor, das LLMs ermöglicht, innerhalb einer Code-Sandbox (d.h. einem virtuellen Computer) zu agieren, um allgemeine Intelligenz in nicht-Code-domänen zu entfalten. Wir zeigen zunächst, dass starke LLMs ohne zusätzliches Training Generalisierungsfähigkeiten aufweisen, um die Code-Sandbox für nicht-Code-Aufgaben zu nutzen. Beispielsweise greifen LLMs spontan auf externe Ressourcen zu, um neues Wissen zu erwerben, nutzen das Dateisystem zur Verarbeitung langer Kontexte und führen Skripte aus, um Formatierungsanforderungen zu erfüllen. Des Weiteren demonstrieren wir, dass diese agentenhaften Fähigkeiten durch LLM-in-Sandbox Reinforcement Learning (LLM-in-Sandbox-RL) verbessert werden können, welches ausschließlich nicht-agentenhafte Daten verwendet, um Modelle für die Sandbox-Exploration zu trainieren. Experimente belegen, dass LLM-in-Sandbox sowohl im trainingsfreien als auch im nachtrainierten Modus eine robuste Generalisierung über Mathematik, Physik, Chemie, Biomedizin, Langkontextverständnis und Instruktionsbefolgung erreicht. Abschließend analysieren wir die Effizienz von LLM-in-Sandbox aus Rechen- und Systemperspektive und veröffentlichen es als Python-Paket, um die praktische Implementierung zu erleichtern.

English

We introduce LLM-in-Sandbox, enabling LLMs to explore within a code sandbox (i.e., a virtual computer), to elicit general intelligence in non-code domains. We first demonstrate that strong LLMs, without additional training, exhibit generalization capabilities to leverage the code sandbox for non-code tasks. For example, LLMs spontaneously access external resources to acquire new knowledge, leverage the file system to handle long contexts, and execute scripts to satisfy formatting requirements. We further show that these agentic capabilities can be enhanced through LLM-in-Sandbox Reinforcement Learning (LLM-in-Sandbox-RL), which uses only non-agentic data to train models for sandbox exploration. Experiments demonstrate that LLM-in-Sandbox, in both training-free and post-trained settings, achieves robust generalization spanning mathematics, physics, chemistry, biomedicine, long-context understanding, and instruction following. Finally, we analyze LLM-in-Sandbox's efficiency from computational and system perspectives, and open-source it as a Python package to facilitate real-world deployment.

LLM-in-Sandbox evoziert allgemeine agentische Intelligenz

LLM-in-Sandbox Elicits General Agentic Intelligence

papers.abstract

Support