從弱點中學習：針對小型電腦使用代理的自動化領域特化

摘要

電腦使用代理（CUA）近期取得了顯著進展，但為每個軟體領域部署一個獨立的大型專家模型仍成本高昂。小型開放式電腦使用代理是更具實用性的專門化目標，但其表現仍明顯較弱，且在不同領域呈現不均勻的特定失敗模式。一個直接的解決方案是為目標領域合成大規模訓練資料，但我們發現這種單純的方法僅能帶來邊際改善。基於此觀察，我們提出 LearnWeak——一個無需人工標註的小型電腦使用代理專門化框架，利用較強的參考代理來識別學生在目標領域的弱點，並據此合成針對性任務及自動建構監督訊號。LearnWeak 更進一步引入一種具錯誤感知能力的專門化目標函數，將規劃錯誤與執行錯誤分離，從而實現比廣泛統一監督更精準的行為更新。在 OSWorld 基準測試中，LearnWeak 在八個領域上分別比 EvoCUA-8B 和 OpenCUA-7B 平均提升 11.6 與 11.1 個百分點。我們亦驗證了所提出的學生感知資料集生成與訓練方法，優於現有的自主軌跡生成與訓練基準。本研究凸顯了在資料合成與代理訓練中納入學生感知的重要性，為小型電腦使用代理在多元領域中更原則化且高效的專門化路徑指明了方向。

English

Computer-use agents (CUAs) have recently made substantial progress, but deploying a separate large expert for each software domain remains expensive. Small open computer-use agents are more practical specialization targets, but they remain substantially weaker and exhibit uneven domain-specific failures. A straightforward remedy is to synthesize large-scale training data for the target domain, yet we find that this naive approach yields only marginal improvements. Building on this observation, we introduce LearnWeak, an annotation-free specialization framework for small computer-use agents that uses a stronger reference agent to identify the student's weaknesses in the target domain, synthesize targeted tasks, and construct supervision automatically. LearnWeak further introduces an error-aware specialization objective that disentangles planning and execution errors, enabling more behaviorally precise updates than broad uniform supervision. On OSWorld, LearnWeak achieves average gains of 11.6 and 11.1 percentage points over EvoCUA-8B and OpenCUA-7B, respectively, across eight domains. We also validate that our student-aware dataset generation and training approaches outperform existing autonomous trajectory generation and training baselines. Our work highlights the importance of student awareness in both data synthesis and agent training, pointing toward a more principled and efficient path for specializing small computer-use agents in diverse domains.