ASTRA：智能体轨迹与强化竞技场的自动化合成

摘要

大型语言模型（LLMs）作为工具增强型智能体正日益广泛应用于多步骤决策任务，但训练稳健的工具使用智能体仍具挑战性。现有方法仍需人工干预，依赖不可验证的模拟环境，仅采用监督微调（SFT）或强化学习（RL）单一范式，且难以实现稳定的长周期多轮次学习。为应对这些挑战，我们提出ASTRA框架——通过可扩展数据合成与可验证强化学习，实现工具增强型语言模型智能体全自动端到端训练。ASTRA集成两大互补组件：首先，基于工具调用图静态拓扑结构的数据流水线可合成多样化、结构化的轨迹序列，从而培养广泛可迁移的工具使用能力；其次，通过捕捉人类语义推理的丰富组合拓扑，环境合成框架将分解后的问答轨迹转化为独立、可代码执行且规则可验证的环境，实现确定性多轮次强化学习。基于该方法，我们开发了统一训练方案：利用轨迹级奖励整合SFT与在线RL，平衡任务完成度与交互效率。在多个工具使用基准测试中，ASTRA训练的模型在同等规模下达到最先进性能，在保持核心推理能力的同时逼近闭源系统水平。我们已开源完整流水线、环境配置及训练模型：https://github.com/LianjiaTech/astra。

English

Large language models (LLMs) are increasingly used as tool-augmented agents for multi-step decision making, yet training robust tool-using agents remains challenging. Existing methods still require manual intervention, depend on non-verifiable simulated environments, rely exclusively on either supervised fine-tuning (SFT) or reinforcement learning (RL), and struggle with stable long-horizon, multi-turn learning. To address these challenges, we introduce ASTRA, a fully automated end-to-end framework for training tool-augmented language model agents via scalable data synthesis and verifiable reinforcement learning. ASTRA integrates two complementary components. First, a pipeline that leverages the static topology of tool-call graphs synthesizes diverse, structurally grounded trajectories, instilling broad and transferable tool-use competence. Second, an environment synthesis framework that captures the rich, compositional topology of human semantic reasoning converts decomposed question-answer traces into independent, code-executable, and rule-verifiable environments, enabling deterministic multi-turn RL. Based on this method, we develop a unified training methodology that integrates SFT with online RL using trajectory-level rewards to balance task completion and interaction efficiency. Experiments on multiple agentic tool-use benchmarks demonstrate that ASTRA-trained models achieve state-of-the-art performance at comparable scales, approaching closed-source systems while preserving core reasoning ability. We release the full pipelines, environments, and trained models at https://github.com/LianjiaTech/astra.

ASTRA：智能体轨迹与强化竞技场的自动化合成

ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas

摘要

Support