EnvScaler: 프로그램 합성을 통한 LLM 에이전트용 도구 상호작용 환경 확장

초록

대규모 언어 모델(LLM)은 다양한 현실 환경에서 에이전트로 작동하도록 훈련될 것으로 기대되지만, 이 과정은 풍부하고 다양한 도구 상호작용 샌드박스에 의존합니다. 그러나 실제 시스템에 대한 접근은 종종 제한적이며, LLM 기반 시뮬레이션 환경은 환각과 비일관성 문제가 발생하기 쉽고, 수동으로 구축된 샌드박스는 확장하기 어렵습니다. 본 논문에서는 프로그램적 합성을 통한 확장 가능한 도구 상호작용 환경을 위한 자동화 프레임워크인 EnvScaler를 제안합니다. EnvScaler는 두 가지 구성 요소로 이루어집니다. 첫째, SkelBuilder는 토픽 마이닝, 논리 모델링 및 품질 평가를 통해 다양한 환경 골격을 구축합니다. 그런 다음 ScenGenerator는 각 환경에 대해 여러 작업 시나리오와 규칙 기반 궤적 검증 기능을 생성합니다. EnvScaler를 사용하여 191개 환경과 약 7,000개 시나리오를 합성하였으며, 이를 Qwen3 시리즈 모델의 지도 미세 조정(SFT) 및 강화 학습(RL)에 적용했습니다. 3개 벤치마크에 대한 결과는 EnvScaler가 다중 턴, 다중 도구 상호작용을 포함한 복잡한 환경에서 LLM의 작업 해결 능력을 크게 향상시킴을 보여줍니다. 코드와 데이터는 https://github.com/RUC-NLPIR/EnvScaler에서 공개합니다.

English

Large language models (LLMs) are expected to be trained to act as agents in various real-world environments, but this process relies on rich and varied tool-interaction sandboxes. However, access to real systems is often restricted; LLM-simulated environments are prone to hallucinations and inconsistencies; and manually built sandboxes are hard to scale. In this paper, we propose EnvScaler, an automated framework for scalable tool-interaction environments via programmatic synthesis. EnvScaler comprises two components. First, SkelBuilder constructs diverse environment skeletons through topic mining, logic modeling, and quality evaluation. Then, ScenGenerator generates multiple task scenarios and rule-based trajectory validation functions for each environment. With EnvScaler, we synthesize 191 environments and about 7K scenarios, and apply them to Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) for Qwen3 series models. Results on three benchmarks show that EnvScaler significantly improves LLMs' ability to solve tasks in complex environments involving multi-turn, multi-tool interactions. We release our code and data at https://github.com/RUC-NLPIR/EnvScaler.

EnvScaler: 프로그램 합성을 통한 LLM 에이전트용 도구 상호작용 환경 확장

EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis

초록

Support