ClawGym：一個用於建構高效能Claw代理程式的可擴充框架

摘要

爪式環境支援針對本地文件、工具及持久化工作區狀態的多步驟工作流程。然而由於缺乏系統性框架的支撐，特別是能夠合成可驗證訓練數據並將其與智能體訓練及診斷評估相整合的框架，這類環境的可擴展開發仍受限。為解決此問題，我們提出ClawGym——一個支援爪式個人智能體全生命週期開發的可擴展框架。具體而言，我們構建了ClawGym-SynData數據集，該數據集包含1.35萬個經篩選的合成任務，這些任務源自人物畫像驅動的意圖與技能錨定操作，並配備了擬真模擬工作區及混合驗證機制。我們隨後通過對黑盒推演軌跡進行監督式微調，訓練出系列高性能爪式模型（命名為ClawGym-Agents），並基於跨任務沙箱的並行化推演輕量級管道進一步探索強化學習。為支持可靠評估，我們還構建了ClawGym-Bench基準測試集，包含200個經自動化篩選與人機協同校準的實例。相關資源將於近期在https://github.com/ClawGym發布。

English

Claw-style environments support multi-step workflows over local files, tools, and persistent workspace states. However, scalable development around these environments remains constrained by the absence of a systematic framework, especially one for synthesizing verifiable training data and integrating it with agent training and diagnostic evaluation. To address this challenge, we present ClawGym, a scalable framework that supports the full lifecycle of Claw-style personal agent development. Concretely, we construct ClawGym-SynData, a diverse dataset of 13.5K filtered tasks synthesized from persona-driven intents and skill-grounded operations, paired with realistic mock workspaces and hybrid verification mechanisms. We then train a family of capable Claw-style models, termed ClawGym-Agents, through supervised fine-tuning on black-box rollout trajectories, and further explore reinforcement learning via a lightweight pipeline that parallelizes rollouts across per-task sandboxes.To support reliable evaluation, we further construct ClawGym-Bench, a benchmark of 200 instances calibrated through automated filtering and human-LLM review. Relevant resources will be soon released at https://github.com/ClawGym.

ClawGym：一個用於建構高效能Claw代理程式的可擴充框架

ClawGym: A Scalable Framework for Building Effective Claw Agents

摘要

Support