ClawGym：効果的なClawエージェントを構築するためのスケーラブルなフレームワーク

要旨

Clawスタイル環境は、ローカルファイル、ツール、および永続的なワークスペース状態に対する多段階ワークフローをサポートします。しかし、これらの環境におけるスケーラブルな開発は、体系的なフレームワーク、特に検証可能なトレーニングデータを合成し、エージェント訓練と診断的評価に統合するためのフレームワークが欠如しているため、制約を受け続けています。この課題に対処するため、Clawスタイルのパーソナルエージェント開発の全ライフサイクルをサポートするスケーラブルなフレームワークであるClawGymを提案します。具体的には、ペルソナ駆動の意図とスキルに基づく操作から合成された13.5Kのフィルタリング済みタスクからなる多様なデータセットClawGym-SynDataを構築し、現実的なモックワークスペースとハイブリッド検証メカニズムを組み合わせました。次に、ブラックボックス展開軌跡に対する教師ありファインチューニングを通じて、ClawGym-Agentsと称される一連の高性能なClawスタイルモデルを訓練し、タスクごとのサンドボックスで展開を並列化する軽量パイプラインによる強化学習の探求も行いました。信頼性の高い評価をサポートするため、自動フィルタリングと人間-LLMレビューで較正された200インスタンスのベンチマークClawGym-Benchをさらに構築しました。関連リソースは近日中にhttps://github.com/ClawGym で公開予定です。

English

Claw-style environments support multi-step workflows over local files, tools, and persistent workspace states. However, scalable development around these environments remains constrained by the absence of a systematic framework, especially one for synthesizing verifiable training data and integrating it with agent training and diagnostic evaluation. To address this challenge, we present ClawGym, a scalable framework that supports the full lifecycle of Claw-style personal agent development. Concretely, we construct ClawGym-SynData, a diverse dataset of 13.5K filtered tasks synthesized from persona-driven intents and skill-grounded operations, paired with realistic mock workspaces and hybrid verification mechanisms. We then train a family of capable Claw-style models, termed ClawGym-Agents, through supervised fine-tuning on black-box rollout trajectories, and further explore reinforcement learning via a lightweight pipeline that parallelizes rollouts across per-task sandboxes.To support reliable evaluation, we further construct ClawGym-Bench, a benchmark of 200 instances calibrated through automated filtering and human-LLM review. Relevant resources will be soon released at https://github.com/ClawGym.

ClawGym：効果的なClawエージェントを構築するためのスケーラブルなフレームワーク

ClawGym: A Scalable Framework for Building Effective Claw Agents

要旨

Support