从弱点中学习：面向小型计算机使用代理的自动化领域特化

摘要

计算机使用智能体（CUAs）近期取得了显著进展，但为每个软件领域部署独立的大规模专家模型仍成本高昂。小型开放计算机使用智能体作为更实用的专业化目标，其能力仍显著薄弱，且存在不均匀的领域特定失败。一个直接的补救措施是为目标领域合成大规模训练数据，但我们发现这种朴素方法仅能带来微小改进。基于这一观察，我们提出LearnWeak——一个针对小型计算机使用智能体的零标注专业化框架，它利用更强的参考智能体识别学生模型在目标领域的弱点，合成针对性任务并自动构建监督信号。LearnWeak进一步引入了一种错误感知的专业化目标，将规划错误与执行错误分离开来，从而能够实现比宽泛统一监督更具行为精确性的更新。在OSWorld上，LearnWeak在八个领域相比EvoCUA-8B和OpenCUA-7B分别实现了平均11.6和11.1个百分点的提升。我们还验证了，我们的学生感知数据集生成及训练方法优于现有的自主轨迹生成及训练基线。我们的工作强调了在学生感知在数据合成与智能体训练中的重要性，为在多样化领域中实现小型计算机使用智能体的更规范、更高效的专业化指明了方向。

English

Computer-use agents (CUAs) have recently made substantial progress, but deploying a separate large expert for each software domain remains expensive. Small open computer-use agents are more practical specialization targets, but they remain substantially weaker and exhibit uneven domain-specific failures. A straightforward remedy is to synthesize large-scale training data for the target domain, yet we find that this naive approach yields only marginal improvements. Building on this observation, we introduce LearnWeak, an annotation-free specialization framework for small computer-use agents that uses a stronger reference agent to identify the student's weaknesses in the target domain, synthesize targeted tasks, and construct supervision automatically. LearnWeak further introduces an error-aware specialization objective that disentangles planning and execution errors, enabling more behaviorally precise updates than broad uniform supervision. On OSWorld, LearnWeak achieves average gains of 11.6 and 11.1 percentage points over EvoCUA-8B and OpenCUA-7B, respectively, across eight domains. We also validate that our student-aware dataset generation and training approaches outperform existing autonomous trajectory generation and training baselines. Our work highlights the importance of student awareness in both data synthesis and agent training, pointing toward a more principled and efficient path for specializing small computer-use agents in diverse domains.