ChatPaper.aiChatPaper

WebShaper:基於信息尋求形式化的自主數據合成

WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization

July 20, 2025
作者: Zhengwei Tao, Jialong Wu, Wenbiao Yin, Junkai Zhang, Baixuan Li, Haiyang Shen, Kuan Li, Liwen Zhang, Xinyu Wang, Yong Jiang, Pengjun Xie, Fei Huang, Jingren Zhou
cs.AI

摘要

大型語言模型(LLM)驅動的代理的出現,通過基於網絡的信息搜索(IS)能力解決複雜開放式任務,徹底革新了人工智能領域。高質量訓練數據的稀缺性限制了IS代理的發展。現有方法通常採用信息驅動的範式,即先收集網絡數據,然後基於檢索生成問題。然而,這可能導致信息結構與推理結構、問題與答案之間的不一致。為緩解這一問題,我們提出了一個形式化驅動的IS數據合成框架WebShaper來構建數據集。WebShaper通過集合論系統地形式化IS任務。形式化的核心是知識投影(KP)概念,它通過KP操作組合實現對推理結構的精確控制。在合成過程中,我們首先創建種子任務,然後使用多步擴展流程。在每一步中,基於我們的形式化,一個代理擴展器利用檢索和驗證工具將當前形式化問題擴展得更為複雜。我們在合成數據集上訓練模型。實驗結果表明,WebShaper在GAIA和WebWalkerQA基準測試中,在開源IS代理中達到了最先進的性能。
English
The advent of Large Language Model (LLM)-powered agents has revolutionized artificial intelligence by enabling solutions to complex, open-ended tasks through web-based information-seeking (IS) capabilities. The scarcity of high-quality training data has limited the development of IS agents. Existing approaches typically adopt an information-driven paradigm that first collects web data and then generates questions based on the retrieval. However, this may lead to inconsistency between information structure and reasoning structure, question and answer. To mitigate, we propose a formalization-driven IS data synthesis framework WebShaper to construct a dataset. WebShaper systematically formalizes IS tasks through set theory. Central to the formalization is the concept of Knowledge Projections (KP), which enables precise control over reasoning structure by KP operation compositions. During synthesis, we begin by creating seed tasks, then use a multi-step expansion process. At each step, an agentic Expander expands the current formal question more complex with retrieval and validation tools based on our formalization. We train our model on the synthesized dataset. Experiment results demonstrate that WebShaper achieves state-of-the-art performance among open-sourced IS agents on GAIA and WebWalkerQA benchmarks.
PDF395July 22, 2025