EnvScaler: プログラム合成によるLLMエージェントのためのツール連携環境のスケーリング

要旨

大規模言語モデル（LLM）は、様々な実世界環境でエージェントとして動作するよう訓練されることが期待されているが、このプロセスには豊富で多様なツール連携サンドボックスが不可欠である。しかし、実システムへのアクセスは制限されることが多く、LLMシミュレーション環境は幻覚や矛盾が生じやすく、手動構築のサンドボックスは拡張性に課題がある。本論文では、プログラム合成によるスケーラブルなツール連携環境の自動構築フレームワーク「EnvScaler」を提案する。EnvScalerは2つのコンポーネントで構成される。まずSkelBuilderが、トピックマイニング、ロジックモデリング、品質評価を通じて多様な環境骨格を構築する。続いてScenGeneratorが、各環境に対して複数のタスクシナリオとルールベースの軌道検証機能を生成する。EnvScalerを用いて、191の環境と約7,000のシナリオを合成し、Qwen3シリーズモデルの教師ありファインチューニング（SFT）および強化学習（RL）に適用した。3つのベンチマークによる評価結果から、EnvScalerが多段階・多ツール連携を伴う複雑環境におけるタスク解決能力をLLMに大幅に向上させることが示された。コードとデータはhttps://github.com/RUC-NLPIR/EnvScalerで公開している。

English

Large language models (LLMs) are expected to be trained to act as agents in various real-world environments, but this process relies on rich and varied tool-interaction sandboxes. However, access to real systems is often restricted; LLM-simulated environments are prone to hallucinations and inconsistencies; and manually built sandboxes are hard to scale. In this paper, we propose EnvScaler, an automated framework for scalable tool-interaction environments via programmatic synthesis. EnvScaler comprises two components. First, SkelBuilder constructs diverse environment skeletons through topic mining, logic modeling, and quality evaluation. Then, ScenGenerator generates multiple task scenarios and rule-based trajectory validation functions for each environment. With EnvScaler, we synthesize 191 environments and about 7K scenarios, and apply them to Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) for Qwen3 series models. Results on three benchmarks show that EnvScaler significantly improves LLMs' ability to solve tasks in complex environments involving multi-turn, multi-tool interactions. We release our code and data at https://github.com/RUC-NLPIR/EnvScaler.

EnvScaler: プログラム合成によるLLMエージェントのためのツール連携環境のスケーリング

EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis

要旨

Support