為智能體構建網絡，而非為網絡構建智能體。

摘要

近期，大型語言模型（LLMs）及其多模態對應技術的進展，激發了開發網路代理——即能在網路環境中自主導航並完成任務的人工智慧系統——的廣泛興趣。儘管在自動化複雜網路互動方面展現出巨大潛力，現有方法仍面臨重大挑戰，這主要源於人類設計的介面與LLM能力之間的根本性不匹配。當前方法在處理網路輸入的固有複雜性時顯得力不從心，無論是處理龐大的DOM樹、依賴於附加資訊的螢幕截圖，還是完全繞過使用者介面進行API互動。本立場文件主張網路代理研究應進行範式轉移：與其迫使網路代理適應為人類設計的介面，不如開發一種專門針對代理能力優化的新型互動範式。為此，我們引入了「代理式網路介面」（Agentic Web Interface, AWI）的概念，這是一種專為代理導航網站而設計的介面。我們確立了AWI設計的六項指導原則，強調安全性、效率和標準化，以兼顧所有主要利益相關者的利益。這一重新框架旨在克服現有介面的根本限制，為更高效、可靠和透明的網路代理設計鋪平道路，這將是一項涉及更廣泛機器學習社群協作的共同努力。

English

Recent advancements in Large Language Models (LLMs) and multimodal counterparts have spurred significant interest in developing web agents -- AI systems capable of autonomously navigating and completing tasks within web environments. While holding tremendous promise for automating complex web interactions, current approaches face substantial challenges due to the fundamental mismatch between human-designed interfaces and LLM capabilities. Current methods struggle with the inherent complexity of web inputs, whether processing massive DOM trees, relying on screenshots augmented with additional information, or bypassing the user interface entirely through API interactions. This position paper advocates for a paradigm shift in web agent research: rather than forcing web agents to adapt to interfaces designed for humans, we should develop a new interaction paradigm specifically optimized for agentic capabilities. To this end, we introduce the concept of an Agentic Web Interface (AWI), an interface specifically designed for agents to navigate a website. We establish six guiding principles for AWI design, emphasizing safety, efficiency, and standardization, to account for the interests of all primary stakeholders. This reframing aims to overcome fundamental limitations of existing interfaces, paving the way for more efficient, reliable, and transparent web agent design, which will be a collaborative effort involving the broader ML community.

為智能體構建網絡，而非為網絡構建智能體。

Build the web for agents, not agents for the web

摘要

Support