Recon-Act:一個通過網路偵察、工具生成與任務執行實現自我進化的多代理瀏覽器使用系統
Recon-Act: A Self-Evolving Multi-Agent Browser-Use System via Web Reconnaissance, Tool Generation, and Task Execution
September 25, 2025
作者: Kaiwen He, Zhiwei Wang, Chenyi Zhuang, Jinjie Gu
cs.AI
摘要
近年來,多模態模型取得了顯著進展,為智能瀏覽器使用代理鋪平了道路。然而,在解決現實世界網頁上的多輪、長視野軌跡任務時,現有代理仍面臨動作序列混亂和執行過程中過多試錯的問題。本文介紹了Recon-Act,這是一個基於偵察-行動行為範式的自我進化多代理框架。該系統由偵察團隊和行動團隊組成:前者進行比較分析和工具生成,後者負責意圖分解、工具編排和執行。通過對比錯誤軌跡與成功軌跡,偵察團隊推斷補救措施,並將其抽象為統一概念的通用工具,無論是以提示形式還是基於規則的代碼形式,並實時註冊到工具檔案中。行動團隊在這些目標工具的加持下重新推理過程,從而建立了一個數據-工具-行動-反饋的閉環訓練管道。按照本文提出的六級實施路線圖,我們目前已達到第三級(有限的人機交互干預)。利用通過偵察獲得的通用工具,Recon-Act大幅提升了對未見網站的適應性和長視野任務的解決能力,並在具有挑戰性的VisualWebArena數據集上實現了最先進的性能。
English
Recent years, multimodal models have made remarkable strides and pave the way
for intelligent browser use agents. However, when solving tasks on real world
webpages in multi-turn, long-horizon trajectories, current agents still suffer
from disordered action sequencing and excessive trial and error during
execution. This paper introduces Recon-Act, a self-evolving multi-agent
framework grounded in Reconnaissance-Action behavioral paradigm. The system
comprises a Reconnaissance Team and an Action Team: the former conducts
comparative analysis and tool generation, while the latter handles intent
decomposition, tool orchestration, and execution. By contrasting the erroneous
trajectories with successful ones, the Reconnaissance Team infers remedies, and
abstracts them into a unified notion of generalized tools, either expressed as
hints or as rule-based codes, and register to the tool archive in real time.
The Action Team reinference the process empowered with these targeting tools,
thus establishing a closed-loop training pipeline of
data-tools-action-feedback. Following the 6 level implementation roadmap
proposed in this work, we have currently reached Level 3 (with limited
human-in-the-loop intervention). Leveraging generalized tools obtained through
reconnaissance, Recon-Act substantially improves adaptability to unseen
websites and solvability on long-horizon tasks, and achieves state-of-the-art
performance on the challenging VisualWebArena dataset.