ChatPaper.aiChatPaper

高效计算机使用代理训练

Efficient Agent Training for Computer Use

May 20, 2025
作者: Yanheng He, Jiahe Jin, Pengfei Liu
cs.AI

摘要

长期以来,获取高质量轨迹数据的规模化一直是开发类人计算机使用代理的关键瓶颈。我们推出了PC Agent-E,一种高效的代理训练框架,显著降低了对大规模人类演示的依赖。仅从312条人工标注的计算机使用轨迹出发,我们通过Claude 3.7 Sonnet合成了多样化的动作决策,进一步提升了数据质量。在这些增强轨迹上训练的PC Agent-E模型,在WindowsAgentArena-V2(我们同时发布的一个改进基准)上取得了141%的相对提升,超越了经过扩展思考的Claude 3.7 Sonnet。此外,PC Agent-E在OSWorld上展现出对不同操作系统的强大泛化能力。我们的研究表明,少量高质量轨迹数据即可激发强大的计算机使用能力。
English
Scaling up high-quality trajectory data has long been a critical bottleneck for developing human-like computer use agents. We introduce PC Agent-E, an efficient agent training framework that significantly reduces reliance on large-scale human demonstrations. Starting with just 312 human-annotated computer use trajectories, we further improved data quality by synthesizing diverse action decisions with Claude 3.7 Sonnet. Trained on these enriched trajectories, our PC Agent-E model achieved a remarkable 141% relative improvement, surpassing the strong Claude 3.7 Sonnet with extended thinking on WindowsAgentArena-V2, an improved benchmark we also released. Furthermore, PC Agent-E demonstrates strong generalizability to different operating systems on OSWorld. Our findings suggest that strong computer use capabilities can be stimulated from a small amount of high-quality trajectory data.

Summary

AI-Generated Summary

PDF322May 22, 2025