Web-CogReasoner：面向网络智能体的知识驱动认知推理

摘要

多模态大规模模型显著推动了网络智能体的发展，使其能够以类似人类认知的方式感知和交互数字环境。本文主张，网络智能体首先需获取足够的知识，才能有效参与认知推理。因此，我们将网络智能体的能力分解为两个关键阶段：知识内容学习与认知过程。为形式化这一观点，我们提出了Web-CogKnowledge框架，将知识分类为事实性、概念性和程序性。在此框架中，知识内容学习对应智能体的记忆与理解过程，依赖于前两类知识，代表了学习的“是什么”；而认知过程则对应探索，基于程序性知识，定义了推理与行动的“如何”。为促进知识获取，我们构建了Web-CogDataset，这是一个从14个真实网站中精心策划的结构化资源，旨在系统性地灌输网络智能体所需的核心知识。该数据集作为智能体的概念基础——理解的“名词”——同时也是学习如何推理和行动的基础。基于此，我们通过一种新颖的知识驱动链式思维（CoT）推理框架，将这些过程操作化，开发并训练了我们提出的智能体——Web-CogReasoner。大量实验表明，其在泛化至未见任务时，尤其是在结构化知识起决定性作用的情况下，显著优于现有模型。为支持严谨评估，我们引入了Web-CogBench，这是一个全面的评估套件，旨在评估和比较智能体在划定知识领域及认知能力上的表现。我们的代码和数据已在https://github.com/Gnonymous/Web-CogReasoner开源。

English

Multimodal large-scale models have significantly advanced the development of web agents, enabling perception and interaction with digital environments akin to human cognition. In this paper, we argue that web agents must first acquire sufficient knowledge to effectively engage in cognitive reasoning. Therefore, we decompose a web agent's capabilities into two essential stages: knowledge content learning and cognitive processes. To formalize this, we propose Web-CogKnowledge Framework, categorizing knowledge as Factual, Conceptual, and Procedural. In this framework, knowledge content learning corresponds to the agent's processes of Memorizing and Understanding, which rely on the first two knowledge types, representing the "what" of learning. Conversely, cognitive processes correspond to Exploring, grounded in Procedural knowledge, defining the "how" of reasoning and action. To facilitate knowledge acquisition, we construct the Web-CogDataset, a structured resource curated from 14 real-world websites, designed to systematically instill core knowledge necessary for web agent. This dataset serves as the agent's conceptual grounding-the "nouns" upon which comprehension is built-as well as the basis for learning how to reason and act. Building on this foundation, we operationalize these processes through a novel knowledge-driven Chain-of-Thought (CoT) reasoning framework, developing and training our proposed agent, the Web-CogReasoner. Extensive experimentation reveals its significant superiority over existing models, especially in generalizing to unseen tasks where structured knowledge is decisive. To enable rigorous evaluation, we introduce the Web-CogBench, a comprehensive evaluation suite designed to assess and compare agent performance across the delineated knowledge domains and cognitive capabilities. Our code and data is open sourced at https://github.com/Gnonymous/Web-CogReasoner

Web-CogReasoner：面向网络智能体的知识驱动认知推理

Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web Agents

摘要

Support