GraphTracer:基於圖引導的大型語言模型代理故障追蹤技術,實現穩健的多輪深度搜索
GraphTracer: Graph-Guided Failure Tracing in LLM Agents for Robust Multi-Turn Deep Search
October 12, 2025
作者: Heng Zhang, Yuling Shi, Xiaodong Gu, Haochen You, Zijian Zhang, Lubin Gan, Yilei Yuan, Jin Huang
cs.AI
摘要
基於大型語言模型的多智能體系統在複雜任務中通過協調合作表現出色,但在多輪深度搜索場景中卻面臨高失敗率。現有的時間歸因方法難以準確診斷根本原因,尤其是在錯誤在多個智能體之間傳播的情況下。通過分析動作序列來自動化失敗歸因的嘗試仍然無效,因為這些方法無法考慮跨智能體的信息依賴性。本文識別了兩個核心挑戰:(i) 在多智能體錯誤傳播中區分症狀與根本原因,以及 (ii) 追蹤超越時間順序的信息依賴性。為解決這些問題,我們引入了GraphTracer,這是一個通過信息流分析重新定義失敗歸因的框架。GraphTracer構建信息依賴圖(IDGs)來明確捕捉智能體如何引用和基於先前的輸出。它通過追蹤這些依賴結構來定位根本原因,而不是依賴於時間序列。GraphTracer還使用圖感知的合成數據生成來針對關鍵節點,創建真實的失敗場景。在Who\&When基準上的評估以及在生產系統中的集成表明,GraphTracer-8B相比最先進的模型,歸因準確率提高了高達18.18%,並在部署的多智能體框架中實現了4.8%到14.2%的性能提升,為多智能體系統調試提供了一個強大的解決方案。
English
Multi-agent systems powered by Large Language Models excel at complex tasks
through coordinated collaboration, yet they face high failure rates in
multi-turn deep search scenarios. Existing temporal attribution methods
struggle to accurately diagnose root causes, particularly when errors propagate
across multiple agents. Attempts to automate failure attribution by analyzing
action sequences remain ineffective due to their inability to account for
information dependencies that span agents. This paper identifies two core
challenges: (i) distinguishing symptoms from root causes in multi-agent
error propagation, and (ii) tracing information dependencies beyond
temporal order. To address these issues, we introduce GraphTracer, a
framework that redefines failure attribution through information flow analysis.
GraphTracer constructs Information Dependency Graphs (IDGs) to explicitly
capture how agents reference and build on prior outputs. It localizes root
causes by tracing through these dependency structures instead of relying on
temporal sequences. GraphTracer also uses graph-aware synthetic data generation
to target critical nodes, creating realistic failure scenarios. Evaluations on
the Who\&When benchmark and integration into production systems demonstrate
that GraphTracer-8B achieves up to 18.18\% higher attribution accuracy compared
to state-of-the-art models and enables 4.8\% to 14.2\% performance improvements
in deployed multi-agent frameworks, establishing a robust solution for
multi-agent system debugging.