ChatPaper.aiChatPaper

WebWalker:在網頁遍歷中對LLM進行基準測試

WebWalker: Benchmarking LLMs in Web Traversal

January 13, 2025
作者: Jialong Wu, Wenbiao Yin, Yong Jiang, Zhenglin Wang, Zekun Xi, Runnan Fang, Deyu Zhou, Pengjun Xie, Fei Huang
cs.AI

摘要

檢索增強生成(RAG)在開放領域問答任務中展現出卓越的表現。然而,傳統搜索引擎可能檢索到膚淺的內容,限制了LLM處理複雜、多層次信息的能力。為了解決這個問題,我們引入了WebWalkerQA,這是一個旨在評估LLM執行網頁遍歷能力的基準。它評估LLM遍歷網站子頁面系統提取高質量數據的能力。我們提出了WebWalker,這是一個模擬人類網頁導航的多智能體框架,通過探索-評論者範式。廣泛的實驗結果表明,WebWalkerQA具有挑戰性,並展示了RAG與WebWalker結合在一起的有效性,通過在現實場景中的水平和垂直整合。
English
Retrieval-augmented generation (RAG) demonstrates remarkable performance across tasks in open-domain question-answering. However, traditional search engines may retrieve shallow content, limiting the ability of LLMs to handle complex, multi-layered information. To address it, we introduce WebWalkerQA, a benchmark designed to assess the ability of LLMs to perform web traversal. It evaluates the capacity of LLMs to traverse a website's subpages to extract high-quality data systematically. We propose WebWalker, which is a multi-agent framework that mimics human-like web navigation through an explore-critic paradigm. Extensive experimental results show that WebWalkerQA is challenging and demonstrates the effectiveness of RAG combined with WebWalker, through the horizontal and vertical integration in real-world scenarios.

Summary

AI-Generated Summary

PDF193January 14, 2025