ChatPaper.aiChatPaper

PC-Agent:一個面向個人電腦複雜任務自動化的分層多智能體協作框架

PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC

February 20, 2025
作者: Haowei Liu, Xi Zhang, Haiyang Xu, Yuyang Wanyan, Junyang Wang, Ming Yan, Ji Zhang, Chunfeng Yuan, Changsheng Xu, Weiming Hu, Fei Huang
cs.AI

摘要

在多模态大語言模型(MLLM)驅動的圖形用戶界面(GUI)代理領域中,相較於智能手機,個人電腦(PC)場景不僅具備更為複雜的交互環境,還涉及更為繁瑣的應用內及應用間工作流程。為應對這些挑戰,我們提出了一種名為PC-Agent的分層代理框架。具體而言,從感知角度出發,我們設計了主動感知模塊(APM),以克服現有MLLM在截圖內容感知能力上的不足。從決策制定角度,為更有效地處理複雜用戶指令及相互依賴的子任務,我們提出了一種分層多代理協作架構,將決策過程分解為指令-子任務-動作三個層次。在此架構內,設置了三個代理(即管理員、進度與決策代理),分別負責指令分解、進度追踪及逐步決策制定。此外,引入反思代理以實現及時的自下而上錯誤反饋與調整。我們還推出了一個包含25條真實世界複雜指令的新基準測試PC-Eval。在PC-Eval上的實驗結果表明,我們的PC-Agent相較於先前最先進的方法,任務成功率提升了32%的絕對值。代碼將公開提供。
English
In the field of MLLM-based GUI agents, compared to smartphones, the PC scenario not only features a more complex interactive environment, but also involves more intricate intra- and inter-app workflows. To address these issues, we propose a hierarchical agent framework named PC-Agent. Specifically, from the perception perspective, we devise an Active Perception Module (APM) to overcome the inadequate abilities of current MLLMs in perceiving screenshot content. From the decision-making perspective, to handle complex user instructions and interdependent subtasks more effectively, we propose a hierarchical multi-agent collaboration architecture that decomposes decision-making processes into Instruction-Subtask-Action levels. Within this architecture, three agents (i.e., Manager, Progress and Decision) are set up for instruction decomposition, progress tracking and step-by-step decision-making respectively. Additionally, a Reflection agent is adopted to enable timely bottom-up error feedback and adjustment. We also introduce a new benchmark PC-Eval with 25 real-world complex instructions. Empirical results on PC-Eval show that our PC-Agent achieves a 32% absolute improvement of task success rate over previous state-of-the-art methods. The code will be publicly available.

Summary

AI-Generated Summary

PDF203February 21, 2025