GraphTracer:基于图引导的大模型代理故障追踪技术,实现稳健的多轮深度搜索
GraphTracer: Graph-Guided Failure Tracing in LLM Agents for Robust Multi-Turn Deep Search
October 12, 2025
作者: Heng Zhang, Yuling Shi, Xiaodong Gu, Haochen You, Zijian Zhang, Lubin Gan, Yilei Yuan, Jin Huang
cs.AI
摘要
基于大型语言模型的多智能体系统在复杂任务中通过协同合作表现出色,但在多轮深度搜索场景中却面临较高的失败率。现有的时序归因方法难以准确诊断根本原因,尤其是在错误跨多个智能体传播时。通过分析动作序列来自动化故障归因的尝试仍然效果不佳,因为它们无法考虑跨智能体的信息依赖关系。本文识别出两个核心挑战:(i) 在多智能体错误传播中区分症状与根本原因,以及 (ii) 追踪超越时序顺序的信息依赖关系。为解决这些问题,我们引入了GraphTracer框架,该框架通过信息流分析重新定义了故障归因。GraphTracer构建信息依赖图(IDGs),明确捕捉智能体如何引用并基于先前的输出进行构建。它通过追踪这些依赖结构而非依赖时序序列来定位根本原因。GraphTracer还利用图感知的合成数据生成技术,针对关键节点创建真实的故障场景。在Who&When基准测试中的评估及在生产系统中的集成表明,GraphTracer-8B相比最先进模型实现了高达18.18%的归因准确率提升,并在部署的多智能体框架中带来了4.8%至14.2%的性能改进,为多智能体系统调试提供了一个稳健的解决方案。
English
Multi-agent systems powered by Large Language Models excel at complex tasks
through coordinated collaboration, yet they face high failure rates in
multi-turn deep search scenarios. Existing temporal attribution methods
struggle to accurately diagnose root causes, particularly when errors propagate
across multiple agents. Attempts to automate failure attribution by analyzing
action sequences remain ineffective due to their inability to account for
information dependencies that span agents. This paper identifies two core
challenges: (i) distinguishing symptoms from root causes in multi-agent
error propagation, and (ii) tracing information dependencies beyond
temporal order. To address these issues, we introduce GraphTracer, a
framework that redefines failure attribution through information flow analysis.
GraphTracer constructs Information Dependency Graphs (IDGs) to explicitly
capture how agents reference and build on prior outputs. It localizes root
causes by tracing through these dependency structures instead of relying on
temporal sequences. GraphTracer also uses graph-aware synthetic data generation
to target critical nodes, creating realistic failure scenarios. Evaluations on
the Who\&When benchmark and integration into production systems demonstrate
that GraphTracer-8B achieves up to 18.18\% higher attribution accuracy compared
to state-of-the-art models and enables 4.8\% to 14.2\% performance improvements
in deployed multi-agent frameworks, establishing a robust solution for
multi-agent system debugging.