Web-CogReasoner：邁向知識引導的認知推理網絡代理

摘要

多模態大規模模型顯著推動了網路代理的發展，使其能夠以類似人類認知的方式感知並與數位環境互動。本文主張，網路代理首先必須獲取足夠的知識，才能有效地進行認知推理。因此，我們將網路代理的能力分解為兩個關鍵階段：知識內容學習與認知過程。為此，我們提出了Web-CogKnowledge框架，將知識分類為事實性、概念性和程序性。在此框架中，知識內容學習對應於代理的記憶與理解過程，依賴於前兩類知識，代表學習的「是什麼」；而認知過程則對應於探索，基於程序性知識，定義了推理與行動的「如何」。為促進知識獲取，我們構建了Web-CogDataset，這是一個從14個真實網站中精心策劃的結構化資源，旨在系統性地灌輸網路代理所需的核心知識。此數據集作為代理的概念基礎——理解所依賴的「名詞」——同時也是學習如何推理與行動的基礎。基於此，我們通過新穎的知識驅動的思維鏈（CoT）推理框架，將這些過程操作化，開發並訓練了我們提出的代理——Web-CogReasoner。大量實驗表明，其在泛化到未見任務時，尤其是在結構化知識起決定性作用的情況下，顯著優於現有模型。為實現嚴謹的評估，我們引入了Web-CogBench，這是一個全面的評估套件，旨在評估並比較代理在劃分的知識領域與認知能力上的表現。我們的代碼與數據已開源於https://github.com/Gnonymous/Web-CogReasoner。

English

Multimodal large-scale models have significantly advanced the development of web agents, enabling perception and interaction with digital environments akin to human cognition. In this paper, we argue that web agents must first acquire sufficient knowledge to effectively engage in cognitive reasoning. Therefore, we decompose a web agent's capabilities into two essential stages: knowledge content learning and cognitive processes. To formalize this, we propose Web-CogKnowledge Framework, categorizing knowledge as Factual, Conceptual, and Procedural. In this framework, knowledge content learning corresponds to the agent's processes of Memorizing and Understanding, which rely on the first two knowledge types, representing the "what" of learning. Conversely, cognitive processes correspond to Exploring, grounded in Procedural knowledge, defining the "how" of reasoning and action. To facilitate knowledge acquisition, we construct the Web-CogDataset, a structured resource curated from 14 real-world websites, designed to systematically instill core knowledge necessary for web agent. This dataset serves as the agent's conceptual grounding-the "nouns" upon which comprehension is built-as well as the basis for learning how to reason and act. Building on this foundation, we operationalize these processes through a novel knowledge-driven Chain-of-Thought (CoT) reasoning framework, developing and training our proposed agent, the Web-CogReasoner. Extensive experimentation reveals its significant superiority over existing models, especially in generalizing to unseen tasks where structured knowledge is decisive. To enable rigorous evaluation, we introduce the Web-CogBench, a comprehensive evaluation suite designed to assess and compare agent performance across the delineated knowledge domains and cognitive capabilities. Our code and data is open sourced at https://github.com/Gnonymous/Web-CogReasoner

Web-CogReasoner：邁向知識引導的認知推理網絡代理

Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web Agents

摘要

Support