EnvScaler:透過程式化合成技術擴展LLM代理的工具互動環境
EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis
January 9, 2026
作者: Xiaoshuai Song, Haofei Chang, Guanting Dong, Yutao Zhu, Zhicheng Dou, Ji-Rong Wen
cs.AI
摘要
大型語言模型(LLMs)被期望能訓練成為各種現實環境中的智能代理,但此過程依賴於豐富多樣的工具互動沙箱。然而,真實系統的存取往往受限;LLM模擬環境容易出現幻覺與不一致性;手動建構的沙箱則難以擴展。本文提出EnvScaler——一個基於程式化合成的自動化框架,用於實現可擴展的工具互動環境。EnvScaler包含兩個核心組件:首先,SkelBuilder透過主題挖掘、邏輯建模與品質評估來建構多樣化的環境骨架;隨後,ScenGenerator為每個環境生成多個任務場景及基於規則的軌跡驗證函數。透過EnvScaler,我們合成了191個環境與約7,000個場景,並將其應用於Qwen3系列模型的監督式微調(SFT)與強化學習(RL)訓練。在三個基準測試上的結果表明,EnvScaler能顯著提升LLMs在涉及多輪次、多工具互動的複雜環境中解決任務的能力。我們已於https://github.com/RUC-NLPIR/EnvScaler開源相關程式碼與資料。
English
Large language models (LLMs) are expected to be trained to act as agents in various real-world environments, but this process relies on rich and varied tool-interaction sandboxes. However, access to real systems is often restricted; LLM-simulated environments are prone to hallucinations and inconsistencies; and manually built sandboxes are hard to scale. In this paper, we propose EnvScaler, an automated framework for scalable tool-interaction environments via programmatic synthesis. EnvScaler comprises two components. First, SkelBuilder constructs diverse environment skeletons through topic mining, logic modeling, and quality evaluation. Then, ScenGenerator generates multiple task scenarios and rule-based trajectory validation functions for each environment. With EnvScaler, we synthesize 191 environments and about 7K scenarios, and apply them to Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) for Qwen3 series models. Results on three benchmarks show that EnvScaler significantly improves LLMs' ability to solve tasks in complex environments involving multi-turn, multi-tool interactions. We release our code and data at https://github.com/RUC-NLPIR/EnvScaler.