CORRECT: マルチエージェントシステムにおける知識転移を介した凝縮エラー認識

要旨

マルチエージェントシステム（MAS）は、複雑な現実世界のタスクに対処する能力をますます高めているが、エージェント間の協調、ツールの使用、長期的な推論に依存しているため、エラーの認識が特に困難である。小さなエラーがエージェント間で伝播し、タスクの失敗に至る一方で、長く絡み合った実行軌跡を生成し、人間の開発者や自動化システムにとってデバッグや分析に多大なコストを強いる。我々の重要な洞察は、失敗軌跡（例：ログ）の表面的な違いにもかかわらず、MASのエラーはしばしば類似した構造パターンで繰り返されることである。本論文では、CORRECTを紹介する。これは、蒸留されたエラースキーマのオンラインキャッシュを活用して、新しいリクエスト間で失敗構造の知識を認識し転送する、初めての軽量でトレーニング不要なフレームワークである。このキャッシュベースの再利用により、LLMは推論時にターゲットを絞ったエラー局所化を実行し、高価な再トレーニングを必要とせず、サブ秒単位で動的なMASの展開に適応する。この領域での厳密な研究を支援するため、我々はまた、現実世界の分布に基づいた新しいエラー注入パイプラインを通じて収集された2,000以上の注釈付き軌跡からなる大規模なデータセットCORRECT-Errorを導入し、自然な失敗パターンとの整合性を確保するために人間による評価をさらに行った。7つの多様なMASアプリケーションでの実験により、CORRECTが既存の進歩に対してステップレベルのエラー局所化を最大19.8％向上させ、ほぼゼロのオーバーヘッドで自動化と人間レベルのエラー認識のギャップを大幅に縮めることが示された。

English

Multi-agent systems (MAS) are increasingly capable of tackling complex real-world tasks, yet their reliance on inter-agent coordination, tool use, and long-horizon reasoning makes error recognition particularly challenging. Minor errors can propagate across agents, escalating into task failures while producing long, intertwined execution trajectories that impose significant costs for both human developers and automated systems to debug and analyze. Our key insight is that, despite surface differences in failure trajectories (e.g., logs), MAS errors often recur with similar structural patterns. This paper presents CORRECT, the first lightweight, training-free framework that leverages an online cache of distilled error schemata to recognize and transfer knowledge of failure structures across new requests. This cache-based reuse allows LLMs to perform targeted error localization at inference time, avoiding the need for expensive retraining while adapting to dynamic MAS deployments in subseconds. To support rigorous study in this domain, we also introduce CORRECT-Error, a large-scale dataset of over 2,000 annotated trajectories collected through a novel error-injection pipeline guided by real-world distributions, and further validated through human evaluation to ensure alignment with natural failure patterns. Experiments across seven diverse MAS applications show that CORRECT improves step-level error localization up to 19.8% over existing advances while at near-zero overhead, substantially narrowing the gap between automated and human-level error recognition.

CORRECT: マルチエージェントシステムにおける知識転移を介した凝縮エラー認識

CORRECT: COndensed eRror RECognition via knowledge Transfer in multi-agent systems

要旨

Support