智能体KB:利用跨领域经验实现自主问题解决
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving
July 8, 2025
作者: Xiangru Tang, Tianrui Qin, Tianhao Peng, Ziyang Zhou, Daniel Shao, Tingting Du, Xinming Wei, Peng Xia, Fang Wu, He Zhu, Ge Zhang, Jiaheng Liu, Xingyao Wang, Sirui Hong, Chenglin Wu, Hao Cheng, Chi Wang, Wangchunshu Zhou
cs.AI
摘要
随着语言代理处理日益复杂的任务,它们在有效纠错和跨领域经验复用方面面临挑战。我们引入了Agent KB,一个层次化的经验框架,通过新颖的“推理-检索-精炼”流程实现复杂的代理问题解决。Agent KB解决了一个核心限制:传统上代理无法从彼此的经验中学习。通过捕捉高层策略和详细执行日志,Agent KB创建了一个共享知识库,促进了跨代理的知识转移。在GAIA基准测试中,Agent KB将成功率提升了高达16.28个百分点。在最具挑战性的任务上,Claude-3的成功率从38.46%提升至57.69%,而GPT-4在中等难度任务上从53.49%提升至73.26%。在SWE-bench代码修复任务中,Agent KB使Claude-3的成功率从41.33%提升至53.33%。我们的结果表明,Agent KB提供了一个模块化、框架无关的基础设施,使代理能够从过往经验中学习,并将成功策略泛化至新任务。
English
As language agents tackle increasingly complex tasks, they struggle with
effective error correction and experience reuse across domains. We introduce
Agent KB, a hierarchical experience framework that enables complex agentic
problem solving via a novel Reason-Retrieve-Refine pipeline. Agent KB addresses
a core limitation: agents traditionally cannot learn from each other's
experiences. By capturing both high-level strategies and detailed execution
logs, Agent KB creates a shared knowledge base that enables cross-agent
knowledge transfer. Evaluated on the GAIA benchmark, Agent KB improves success
rates by up to 16.28 percentage points. On the most challenging tasks, Claude-3
improves from 38.46% to 57.69%, while GPT-4 improves from 53.49% to 73.26% on
intermediate tasks. On SWE-bench code repair, Agent KB enables Claude-3 to
improve from 41.33% to 53.33%. Our results suggest that Agent KB provides a
modular, framework-agnostic infrastructure for enabling agents to learn from
past experiences and generalize successful strategies to new tasks.