ChatPaper.aiChatPaper

具有世界模型的网络代理:学习和利用网络导航中的环境动态

Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation

October 17, 2024
作者: Hyungjoo Chae, Namyoung Kim, Kai Tzu-iunn Ong, Minju Gwak, Gwanwoo Song, Jihoon Kim, Sunghwan Kim, Dongha Lee, Jinyoung Yeo
cs.AI

摘要

近来,大型语言模型(LLMs)在构建自主代理方面引起了广泛关注。然而,目前基于LLM的网络代理在长期任务中的表现远非最佳,经常出现诸如重复购买不可退票的飞机票等错误。相比之下,人类能够避免这种不可逆转的错误,因为我们意识到我们的行为可能带来的结果(例如,损失金钱),这也被称为“世界模型”。受此启发,我们的研究首先进行了初步分析,确认了当前LLMs(例如GPT-4o、Claude-3.5-Sonnet等)中缺乏世界模型。然后,我们提出了一种增强世界模型(WMA)的网络代理,通过模拟其行为的结果来实现更好的决策制定。为了克服将LLMs训练为预测下一观察结果的世界模型所面临的挑战,例如观察结果中的重复元素和长HTML输入,我们提出了一种以转换为重点的观察抽象,其中预测目标是自由形式的自然语言描述,专门突出了时间步之间的重要状态差异。在WebArena和Mind2Web上的实验表明,我们的世界模型改善了代理的策略选择而无需训练,并展示了我们的代理相比最近基于树搜索的代理在成本和时间效率上的优势。
English
Large language models (LLMs) have recently gained much attention in building autonomous agents. However, the performance of current LLM-based web agents in long-horizon tasks is far from optimal, often yielding errors such as repeatedly buying a non-refundable flight ticket. By contrast, humans can avoid such an irreversible mistake, as we have an awareness of the potential outcomes (e.g., losing money) of our actions, also known as the "world model". Motivated by this, our study first starts with preliminary analyses, confirming the absence of world models in current LLMs (e.g., GPT-4o, Claude-3.5-Sonnet, etc.). Then, we present a World-model-augmented (WMA) web agent, which simulates the outcomes of its actions for better decision-making. To overcome the challenges in training LLMs as world models predicting next observations, such as repeated elements across observations and long HTML inputs, we propose a transition-focused observation abstraction, where the prediction objectives are free-form natural language descriptions exclusively highlighting important state differences between time steps. Experiments on WebArena and Mind2Web show that our world models improve agents' policy selection without training and demonstrate our agents' cost- and time-efficiency compared to recent tree-search-based agents.

Summary

AI-Generated Summary

PDF442November 16, 2024