基于技能图谱的可扩展终端任务合成研究

摘要

终端智能体已展现出自主命令行执行的强大潜力，但其训练过程仍受限于高质量多样化执行轨迹的稀缺性。现有方法通过合成大规模终端任务实例进行轨迹采样以缓解这一瓶颈，但主要侧重于任务数量的扩展，对智能体实际训练所经历的执行轨迹多样性控制有限。本文提出SkillSynth——基于场景介导技能图的自动化终端任务合成框架。该框架首先构建大规模技能图，以场景作为中间过渡节点连接多样化的命令行技能；随后从图中采样路径作为现实工作流的抽象表示，并通过多智能体系统将其实例化为可执行任务。通过以图采样工作流路径为基础进行任务合成，SkillSynth能显式控制解决合成任务所需的最小执行轨迹多样性。在Terminal-Bench上的实验验证了该框架的有效性。值得一提的是，SkillSynth合成的任务实例已用于训练Hy3 Preview模型，显著提升了其在终端环境下的智能体能力。

English

Terminal agents have demonstrated strong potential for autonomous command-line execution, yet their training remains constrained by the scarcity of high-quality and diverse execution trajectories. Existing approaches mitigate this bottleneck by synthesizing large-scale terminal task instances for trajectory sampling. However, they primarily focus on scaling the number of tasks while providing limited control over the diversity of execution trajectories that agents actually experience during training. In this paper, we present SkillSynth, an automated framework for terminal task synthesis built on a scenario-mediated skill graph. SkillSynth first constructs a large-scale skill graph, where scenarios serve as intermediate transition nodes that connect diverse command-line skills. It then samples paths from this graph as abstractions of real-world workflows, and uses a multi-agent harness to instantiate them into executable task instances. By grounding task synthesis in graph-sampled workflow paths, SkillSynth explicitly controls the diversity of minimal execution trajectories required to solve the synthesized tasks. Experiments on Terminal-Bench demonstrate the effectiveness of SkillSynth. Moreover, task instances synthesized by SkillSynth have been adopted to train Hy3 Preview, contributing to its enhanced agentic capabilities in terminal-based settings.