CORRECT:基于知识迁移的多智能体系统误差识别压缩方法
CORRECT: COndensed eRror RECognition via knowledge Transfer in multi-agent systems
September 28, 2025
作者: Yifan Yu, Moyan Li, Shaoyuan Xu, Jinmiao Fu, Xinhai Hou, Fan Lai, Bryan Wang
cs.AI
摘要
多智能体系统(MAS)在处理复杂现实任务方面日益强大,然而其依赖于智能体间的协调、工具使用及长期推理,使得错误识别尤为困难。细微错误可能在智能体间传播,演变为任务失败,同时产生冗长且交织的执行轨迹,这为人类开发者和自动化系统的调试与分析带来了显著成本。我们的核心洞察是,尽管失败轨迹(如日志)在表面上有差异,但MAS错误往往以相似的结构模式反复出现。本文提出了CORRECT,首个轻量级、无需训练的框架,它利用在线缓存中的精炼错误模式来识别并跨新请求传递失败结构知识。这种基于缓存的重用使LLM能够在推理时进行针对性错误定位,避免了昂贵的再训练,同时能在亚秒级时间内适应动态MAS部署。为支持该领域的严谨研究,我们还引入了CORRECT-Error,一个包含2000多条注释轨迹的大规模数据集,这些轨迹通过受现实分布指导的新型错误注入管道收集,并经过人工评估以确保与自然失败模式的一致性。在七个多样化MAS应用上的实验表明,CORRECT在步骤级错误定位上比现有技术提升了高达19.8%,且几乎无额外开销,显著缩小了自动化与人类级错误识别之间的差距。
English
Multi-agent systems (MAS) are increasingly capable of tackling complex
real-world tasks, yet their reliance on inter-agent coordination, tool use, and
long-horizon reasoning makes error recognition particularly challenging. Minor
errors can propagate across agents, escalating into task failures while
producing long, intertwined execution trajectories that impose significant
costs for both human developers and automated systems to debug and analyze. Our
key insight is that, despite surface differences in failure trajectories (e.g.,
logs), MAS errors often recur with similar structural patterns. This paper
presents CORRECT, the first lightweight, training-free framework that leverages
an online cache of distilled error schemata to recognize and transfer knowledge
of failure structures across new requests. This cache-based reuse allows LLMs
to perform targeted error localization at inference time, avoiding the need for
expensive retraining while adapting to dynamic MAS deployments in subseconds.
To support rigorous study in this domain, we also introduce CORRECT-Error, a
large-scale dataset of over 2,000 annotated trajectories collected through a
novel error-injection pipeline guided by real-world distributions, and further
validated through human evaluation to ensure alignment with natural failure
patterns. Experiments across seven diverse MAS applications show that CORRECT
improves step-level error localization up to 19.8% over existing advances while
at near-zero overhead, substantially narrowing the gap between automated and
human-level error recognition.