ChatPaper.aiChatPaper

ClawGym:构建高效爪式智能体的可扩展框架

ClawGym: A Scalable Framework for Building Effective Claw Agents

April 29, 2026
作者: Fei Bai, Huatong Song, Shuang Sun, Daixuan Cheng, Yike Yang, Chuan Hao, Renyuan Li, Feng Chang, Yuan Wei, Ran Tao, Bryan Dai, Jian Yang, Wayne Xin Zhao
cs.AI

摘要

爪式环境支持对本地文件、工具及持久化工作空间状态进行多步骤工作流操作。然而由于缺乏系统化框架,特别是可验证训练数据合成及其与智能体训练、诊断评估相结合的体系,该类环境的规模化开发仍受限制。为解决这一挑战,我们提出ClawGym——一个支持爪式个人智能体全生命周期开发的规模化框架。具体而言,我们构建了ClawGym-SynData数据集,该数据集包含1.35万项经筛选的合成任务,这些任务源自角色驱动意图与技能锚定操作的组合,并配有模拟真实工作空间及混合验证机制。我们随后通过黑盒推演轨迹的监督微调,训练出系列高性能爪式模型(称为ClawGym-Agents),并借助跨任务沙箱的并行化推演轻量级管道进一步探索强化学习。为支撑可靠评估,我们还构建了ClawGym-Bench基准,包含200个经过自动化筛选和人机协同校验的测试实例。相关资源即将发布于https://github.com/ClawGym。
English
Claw-style environments support multi-step workflows over local files, tools, and persistent workspace states. However, scalable development around these environments remains constrained by the absence of a systematic framework, especially one for synthesizing verifiable training data and integrating it with agent training and diagnostic evaluation. To address this challenge, we present ClawGym, a scalable framework that supports the full lifecycle of Claw-style personal agent development. Concretely, we construct ClawGym-SynData, a diverse dataset of 13.5K filtered tasks synthesized from persona-driven intents and skill-grounded operations, paired with realistic mock workspaces and hybrid verification mechanisms. We then train a family of capable Claw-style models, termed ClawGym-Agents, through supervised fine-tuning on black-box rollout trajectories, and further explore reinforcement learning via a lightweight pipeline that parallelizes rollouts across per-task sandboxes.To support reliable evaluation, we further construct ClawGym-Bench, a benchmark of 200 instances calibrated through automated filtering and human-LLM review. Relevant resources will be soon released at https://github.com/ClawGym.
PDF372May 1, 2026