샌드박스 내 LLM, 일반적인 에이전트 인텔리전스 구현

초록

우리는 LLM-in-Sandbox를 소개합니다. 이는 LLM이 코드 샌드박스(즉, 가상 컴퓨터) 내에서 탐색하며 비코드 영역에서 일반 지능을 발현하도록 합니다. 우리는 먼저 강력한 LLM이 추가 학습 없이도 비코드 작업을 위해 코드 샌드박스를 활용하는 일반화 능력을 보인다는 점을 입증합니다. 예를 들어, LLM은 새로운 지식을 습득하기 위해 외부 리소스에 자발적으로 접근하고, 장문 컨텍스트를 처리하기 위해 파일 시스템을 활용하며, 형식 요구사항을 충족시키기 위해 스크립트를 실행합니다. 우리는 더 나아가 이러한 에이전트 능력이 LLM-in-Sandbox 강화 학습(LLM-in-Sandbox-RL)을 통해 향상될 수 있음을 보입니다. 이 방법은 샌드박스 탐색을 위한 모델을 훈련시키기 위해 비에이전트 데이터만을 사용합니다. 실험 결과, 훈련 불요 및 사후 훈련 설정 모두에서 LLM-in-Sandbox가 수학, 물리학, 화학, 생물의학, 장문 컨텍스트 이해, 지시 따르기를 아우르는 강력한 일반화 성능을 달성함을 확인했습니다. 마지막으로, 우리는 계산 및 시스템 관점에서 LLM-in-Sandbox의 효율성을 분석하고, 실제 배포를 용이하게 하기 위해 이를 Python 패키지로 오픈소스화했습니다.

English

We introduce LLM-in-Sandbox, enabling LLMs to explore within a code sandbox (i.e., a virtual computer), to elicit general intelligence in non-code domains. We first demonstrate that strong LLMs, without additional training, exhibit generalization capabilities to leverage the code sandbox for non-code tasks. For example, LLMs spontaneously access external resources to acquire new knowledge, leverage the file system to handle long contexts, and execute scripts to satisfy formatting requirements. We further show that these agentic capabilities can be enhanced through LLM-in-Sandbox Reinforcement Learning (LLM-in-Sandbox-RL), which uses only non-agentic data to train models for sandbox exploration. Experiments demonstrate that LLM-in-Sandbox, in both training-free and post-trained settings, achieves robust generalization spanning mathematics, physics, chemistry, biomedicine, long-context understanding, and instruction following. Finally, we analyze LLM-in-Sandbox's efficiency from computational and system perspectives, and open-source it as a Python package to facilitate real-world deployment.

샌드박스 내 LLM, 일반적인 에이전트 인텔리전스 구현

LLM-in-Sandbox Elicits General Agentic Intelligence

초록

Support