智能体信息检索中的嵌套浏览器应用学习
Nested Browser-Use Learning for Agentic Information Seeking
December 29, 2025
作者: Baixuan Li, Jialong Wu, Wenbiao Yin, Kuan Li, Zhongwang Zhang, Huifeng Yin, Zhengwei Tao, Liwen Zhang, Pengjun Xie, Jingren Zhou, Yong Jiang
cs.AI
摘要
信息检索(IS)智能体在各类广度和深度搜索任务中已展现出卓越性能,但其工具使用仍主要局限于API级别的片段检索和基于URL的页面获取,限制了通过真实浏览获取更丰富信息的能力。尽管完整的浏览器交互可解锁更深层能力,但其细粒度控制和冗长页面内容返回为ReAct式函数调用智能体带来了巨大复杂性。为弥合这一差距,我们提出嵌套式浏览器使用学习(NestBrowse),通过引入极简而完整的浏览器操作框架,采用嵌套结构将交互控制与页面探索解耦。该设计在实现高效深度网络信息获取的同时,简化了智能体推理过程。在具有挑战性的深度IS基准测试中的实证结果表明,NestBrowse在实践中具有显著优势。进一步的深度分析也印证了其高效性与灵活性。
English
Information-seeking (IS) agents have achieved strong performance across a range of wide and deep search tasks, yet their tool use remains largely restricted to API-level snippet retrieval and URL-based page fetching, limiting access to the richer information available through real browsing. While full browser interaction could unlock deeper capabilities, its fine-grained control and verbose page content returns introduce substantial complexity for ReAct-style function-calling agents. To bridge this gap, we propose Nested Browser-Use Learning (NestBrowse), which introduces a minimal and complete browser-action framework that decouples interaction control from page exploration through a nested structure. This design simplifies agentic reasoning while enabling effective deep-web information acquisition. Empirical results on challenging deep IS benchmarks demonstrate that NestBrowse offers clear benefits in practice. Further in-depth analyses underscore its efficiency and flexibility.