ChatPaper.aiChatPaper

CORRECT:基於知識遷移的多智能體系統中的濃縮錯誤識別

CORRECT: COndensed eRror RECognition via knowledge Transfer in multi-agent systems

September 28, 2025
作者: Yifan Yu, Moyan Li, Shaoyuan Xu, Jinmiao Fu, Xinhai Hou, Fan Lai, Bryan Wang
cs.AI

摘要

多智能體系統(MAS)在處理複雜現實世界任務方面的能力日益增強,然而其依賴於智能體間的協調、工具使用以及長時序推理,使得錯誤識別變得尤為困難。細微的錯誤可能在智能體間傳播,最終導致任務失敗,同時產生冗長且交織的執行軌跡,這對人類開發者和自動化系統的調試與分析都帶來了顯著的成本。我們的核心洞察是,儘管失敗軌跡(如日誌)在表面上看來各不相同,但MAS錯誤往往以相似的結構模式重現。本文提出了CORRECT,這是首個輕量級、無需訓練的框架,它利用在線緩存的精煉錯誤模式來識別並將失敗結構的知識遷移到新的請求中。這種基於緩存的重用使得大型語言模型(LLMs)能在推理時進行針對性的錯誤定位,避免了昂貴的重新訓練,同時在亞秒級時間內適應動態的MAS部署。為了支持這一領域的嚴謹研究,我們還引入了CORRECT-Error,這是一個包含超過2000條註釋軌跡的大規模數據集,這些軌跡通過一個基於真實世界分佈的新穎錯誤注入管道收集,並通過人類評估進一步驗證,以確保與自然失敗模式的一致性。在七個不同MAS應用中的實驗表明,CORRECT在步驟級錯誤定位上相比現有技術提升了高達19.8%,且幾乎不增加額外開銷,顯著縮小了自動化與人類級錯誤識別之間的差距。
English
Multi-agent systems (MAS) are increasingly capable of tackling complex real-world tasks, yet their reliance on inter-agent coordination, tool use, and long-horizon reasoning makes error recognition particularly challenging. Minor errors can propagate across agents, escalating into task failures while producing long, intertwined execution trajectories that impose significant costs for both human developers and automated systems to debug and analyze. Our key insight is that, despite surface differences in failure trajectories (e.g., logs), MAS errors often recur with similar structural patterns. This paper presents CORRECT, the first lightweight, training-free framework that leverages an online cache of distilled error schemata to recognize and transfer knowledge of failure structures across new requests. This cache-based reuse allows LLMs to perform targeted error localization at inference time, avoiding the need for expensive retraining while adapting to dynamic MAS deployments in subseconds. To support rigorous study in this domain, we also introduce CORRECT-Error, a large-scale dataset of over 2,000 annotated trajectories collected through a novel error-injection pipeline guided by real-world distributions, and further validated through human evaluation to ensure alignment with natural failure patterns. Experiments across seven diverse MAS applications show that CORRECT improves step-level error localization up to 19.8% over existing advances while at near-zero overhead, substantially narrowing the gap between automated and human-level error recognition.
PDF11October 1, 2025