AutoWebGLM：基於大型語言模型的網頁導航代理的啟動和強化

摘要

大型語言模型（LLMs）推動了許多智能代理任務，例如網頁導航，但大多數現有代理在真實網頁上的表現遠遠不滿意，原因有三：（1）網頁上的行動多樣性，（2）HTML 文本超過模型處理能力，以及（3）由於網頁的開放域性質，決策複雜性。鑒於這一挑戰，我們開發了AutoWebGLM，這是一個基於ChatGLM3-6B構建的GPT-4表現優越的自動網頁導航代理。受人類瀏覽模式的啟發，我們設計了一個HTML簡化算法來呈現網頁，簡潔地保留重要信息。我們採用混合人工智能方法來構建用於課程訓練的網頁瀏覽數據。然後，我們通過強化學習和拒絕抽樣來啟動模型，進一步促進網頁理解、瀏覽器操作以及有效的任務分解。為了測試，我們建立了一個雙語基準測試AutoWebBench，用於真實世界的網頁瀏覽任務。我們在各種網頁導航基準測試中評估了AutoWebGLM，揭示了它的改進，但也揭示了應對真實環境的潛在挑戰。相關代碼、模型和數據將在https://github.com/THUDM/AutoWebGLM 上發布。

English

Large language models (LLMs) have fueled many intelligent agent tasks, such as web navigation -- but most existing agents perform far from satisfying in real-world webpages due to three factors: (1) the versatility of actions on webpages, (2) HTML text exceeding model processing capacity, and (3) the complexity of decision-making due to the open-domain nature of web. In light of the challenge, we develop AutoWebGLM, a GPT-4-outperforming automated web navigation agent built upon ChatGLM3-6B. Inspired by human browsing patterns, we design an HTML simplification algorithm to represent webpages, preserving vital information succinctly. We employ a hybrid human-AI method to build web browsing data for curriculum training. Then, we bootstrap the model by reinforcement learning and rejection sampling to further facilitate webpage comprehension, browser operations, and efficient task decomposition by itself. For testing, we establish a bilingual benchmark -- AutoWebBench -- for real-world web browsing tasks. We evaluate AutoWebGLM across diverse web navigation benchmarks, revealing its improvements but also underlying challenges to tackle real environments. Related code, model, and data will be released at https://github.com/THUDM/AutoWebGLM.

AutoWebGLM：基於大型語言模型的網頁導航代理的啟動和強化

AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent

摘要

Support