ClawGUI：一個用於訓練、評估與部署GUI代理程式的統一框架

摘要

圖形使用者介體（GUI代理）透過視覺化介面而非程式化API驅動應用程式，藉由點擊、滑動和按鍵操作與任意軟體互動，從而觸及基於命令列介體（CLI代理）無法覆蓋的長尾應用場景。然而，該領域的發展瓶頸主要不在於模型能力，而在於缺乏統一的端到端基礎架構：線上強化學習（RL）訓練受制於環境不穩定性和封閉流程，評估標準在不同研究中悄然漂移，且訓練完成的代理鮮少能部署至真實用戶的實體裝置。為此，我們推出開源框架ClawGUI，透過單一整合架構解決上述三項缺口。ClawGUI-RL首創支援平行虛擬環境與實體裝置的開源GUI代理強化學習架構，結合全域圖形策略優化（GiGPO）與流程獎勵模型（Process Reward Model）實現密集步驟級監督。ClawGUI-Eval在6個基準測試與11+模型間實施全標準化評估流程，相較官方基準實現95.8%的複現準確率。ClawGUI-Agent透過12+聊天平台將訓練完成的代理部署至Android、HarmonyOS與iOS系統，具備混合CLI-GUI控制與持久化個人記憶功能。在此管道中端到端訓練的ClawGUI-2B模型，在MobileWorld GUI-Only任務上達成17.1%的成功率，較同規模MAI-UI-2B基準提升6.0%。

English

GUI agents drive applications through their visual interfaces instead of programmatic APIs, interacting with arbitrary software via taps, swipes, and keystrokes, reaching a long tail of applications that CLI-based agents cannot. Yet progress in this area is bottlenecked less by modeling capacity than by the absence of a coherent full-stack infrastructure: online RL training suffers from environment instability and closed pipelines, evaluation protocols drift silently across works, and trained agents rarely reach real users on real devices. We present ClawGUI, an open-source framework addressing these three gaps within a single harness. ClawGUI-RL provides the first open-source GUI agent RL infrastructure with validated support for both parallel virtual environments and real physical devices, integrating GiGPO with a Process Reward Model for dense step-level supervision. ClawGUI-Eval enforces a fully standardized evaluation pipeline across 6 benchmarks and 11+ models, achieving 95.8\% reproduction against official baselines. ClawGUI-Agent brings trained agents to Android, HarmonyOS, and iOS through 12+ chat platforms with hybrid CLI-GUI control and persistent personalized memory. Trained end to end within this pipeline, ClawGUI-2B achieves 17.1\% Success Rate on MobileWorld GUI-Only, outperforming the same-scale MAI-UI-2B baseline by 6.0\%.

ClawGUI：一個用於訓練、評估與部署GUI代理程式的統一框架

ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

摘要

Support