摆脱自我确认陷阱:一种面向智能体经验学习的“执行-蒸馏-验证”范式
Escaping the Self-Confirmation Trap: An Execute-Distill-Verify Paradigm for Agentic Experience Learning
June 23, 2026
作者: Shiding Zhu, Yudi Qi, Yajie Wang, Jiaze Li, Chao Song, Yaorui Shi, Yibo Miao, Hanqi Gao, Kai Zhang
cs.AI
摘要
基于经验驱动的自我进化对于大语言模型(LLM)智能体在开放世界交互中提升能力至关重要。然而,现有的经验学习方法大多依赖单智能体循环,即同一智能体同时负责执行任务、总结结果和决定记忆内容。这种设定使智能体容易陷入“自我确认陷阱”:错误但自洽的轨迹被误判为成功经验,导致在检索和复用过程中累积错误。为了解决这一问题,我们提出EDV(执行-蒸馏-验证)框架,用于实现可靠的经验学习。在执行阶段,多个异构智能体并行探索同一任务空间,生成多样化的候选轨迹。在蒸馏阶段,一个专门的第三方智能体对这些轨迹进行对比分析,生成候选经验,从而减少以执行者为中心的总结偏差。在验证阶段,执行组通过共识机制验证候选经验,只有通过验证的经验才会被写入共享或私有记忆。通过解耦这三个阶段,EDV将经验学习从孤立的自我反思转变为协作构建,在记忆插入前过滤错误和噪声内容。我们在三个具有挑战性的长周期基准任务上评估了EDV:tau2-bench、Mind2Web和MMTB。结果表明,EDV持续优于强基线方法,验证了可靠的经验构建对于鲁棒的智能体自我进化至关重要。我们的代码开源在 https://github.com/shidingz/EDV。
English
Experience-driven self-evolution is critical for large language model (LLM) agents to improve through open-world interaction. However, existing experience learning methods mostly rely on single-agent loops, where the same agent executes tasks, summarizes outcomes, and determines memory content. This setup makes agents vulnerable to the Self-Confirmation Trap: wrong-but-self-consistent trajectories are misidentified as successful experience, leading to cumulative errors during retrieval and reuse. To address this issue, we propose EDV, an Execute-Distill-Verify framework for reliable experience learning. In the Execute stage, multiple heterogeneous agents explore the same task space in parallel to generate diverse candidate trajectories. In the Distill stage, a dedicated third-party agent comparatively analyzes these trajectories to produce candidate experiences, reducing executor-centric summarization bias. In the Verify stage, the execution group validates candidates via a consensus mechanism, and only approved experiences are written into shared or private memory. By decoupling the three stages, EDV transforms experience learning from isolated self-reflection into collaborative construction, filtering erroneous and noisy content before memory insertion. We evaluate EDV on three challenging long-horizon benchmarks: tau2-bench, Mind2Web and MMTB. Results show EDV consistently outperforms strong baselines, validating that reliable experience construction is essential for robust agent self-evolution. Our code is available at https://github.com/shidingz/EDV.