實體化網路代理:橋接物理與數位領域,實現整合式代理智能
Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence
June 18, 2025
作者: Yining Hong, Rui Sun, Bingxuan Li, Xingcheng Yao, Maxine Wu, Alexander Chien, Da Yin, Ying Nian Wu, Zhecan James Wang, Kai-Wei Chang
cs.AI
摘要
當今的AI代理大多處於孤立狀態——它們要麼檢索並推理從線上獲取的龐大數字資訊與知識;要麼通過具身感知、規劃和行動與物理世界互動,但鮮少同時具備這兩種能力。這種割裂限制了它們解決需要整合物理與數字智能的任務的能力,例如根據線上食譜烹飪、利用動態地圖數據導航,或借助網絡知識解讀現實世界的地標。我們提出了「具身網絡代理」這一新穎的AI代理範式,它流暢地橋接了具身性與網絡規模的推理。為實現這一概念,我們首先開發了「具身網絡代理」任務環境,這是一個統一的模擬平台,將逼真的3D室內外環境與功能性網絡界面緊密結合。基於此平台,我們構建並發布了「具身網絡代理基準」,涵蓋了烹飪、導航、購物、旅遊和地理定位等多樣化任務,這些任務均需跨物理與數字領域的協調推理,以系統評估跨域智能。實驗結果揭示了頂尖AI系統與人類能力之間的顯著性能差距,為具身認知與網絡規模知識獲取的交叉領域既設立了挑戰,也開闢了機遇。所有數據集、代碼及網站均可在我們的項目頁面https://embodied-web-agent.github.io/公開獲取。
English
AI agents today are mostly siloed - they either retrieve and reason over vast
amount of digital information and knowledge obtained online; or interact with
the physical world through embodied perception, planning and action - but
rarely both. This separation limits their ability to solve tasks that require
integrated physical and digital intelligence, such as cooking from online
recipes, navigating with dynamic map data, or interpreting real-world landmarks
using web knowledge. We introduce Embodied Web Agents, a novel paradigm for AI
agents that fluidly bridge embodiment and web-scale reasoning. To
operationalize this concept, we first develop the Embodied Web Agents task
environments, a unified simulation platform that tightly integrates realistic
3D indoor and outdoor environments with functional web interfaces. Building
upon this platform, we construct and release the Embodied Web Agents Benchmark,
which encompasses a diverse suite of tasks including cooking, navigation,
shopping, tourism, and geolocation - all requiring coordinated reasoning across
physical and digital realms for systematic assessment of cross-domain
intelligence. Experimental results reveal significant performance gaps between
state-of-the-art AI systems and human capabilities, establishing both
challenges and opportunities at the intersection of embodied cognition and
web-scale knowledge access. All datasets, codes and websites are publicly
available at our project page https://embodied-web-agent.github.io/.