ChatPaper.aiChatPaper

实体化网络代理:融合物理与数字领域,构建一体化智能代理

Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence

June 18, 2025
作者: Yining Hong, Rui Sun, Bingxuan Li, Xingcheng Yao, Maxine Wu, Alexander Chien, Da Yin, Ying Nian Wu, Zhecan James Wang, Kai-Wei Chang
cs.AI

摘要

当今的AI代理大多处于孤立状态——它们要么检索并推理从在线获取的大量数字信息和知识;要么通过具身感知、规划与行动与物理世界互动——但很少同时兼顾两者。这种分离限制了它们解决需要整合物理与数字智能的任务的能力,例如根据在线食谱烹饪、利用动态地图数据导航,或借助网络知识解读现实世界的地标。我们提出了“具身网络代理”这一新范式,旨在流畅地连接具身与网络规模推理。为实现这一理念,我们首先开发了具身网络代理任务环境,这是一个统一的仿真平台,将逼真的3D室内外环境与功能性网络界面紧密结合。基于此平台,我们构建并发布了具身网络代理基准测试,涵盖烹饪、导航、购物、旅游及地理位置定位等一系列多样化任务——所有这些任务均需跨越物理与数字领域的协调推理,以系统评估跨域智能。实验结果显示,当前最先进的AI系统与人类能力之间存在显著差距,这既揭示了挑战,也为具身认知与网络规模知识访问的交叉领域带来了机遇。所有数据集、代码及网站均在我们的项目页面https://embodied-web-agent.github.io/上公开提供。
English
AI agents today are mostly siloed - they either retrieve and reason over vast amount of digital information and knowledge obtained online; or interact with the physical world through embodied perception, planning and action - but rarely both. This separation limits their ability to solve tasks that require integrated physical and digital intelligence, such as cooking from online recipes, navigating with dynamic map data, or interpreting real-world landmarks using web knowledge. We introduce Embodied Web Agents, a novel paradigm for AI agents that fluidly bridge embodiment and web-scale reasoning. To operationalize this concept, we first develop the Embodied Web Agents task environments, a unified simulation platform that tightly integrates realistic 3D indoor and outdoor environments with functional web interfaces. Building upon this platform, we construct and release the Embodied Web Agents Benchmark, which encompasses a diverse suite of tasks including cooking, navigation, shopping, tourism, and geolocation - all requiring coordinated reasoning across physical and digital realms for systematic assessment of cross-domain intelligence. Experimental results reveal significant performance gaps between state-of-the-art AI systems and human capabilities, establishing both challenges and opportunities at the intersection of embodied cognition and web-scale knowledge access. All datasets, codes and websites are publicly available at our project page https://embodied-web-agent.github.io/.
PDF141June 19, 2025