摆脱自我确认陷阱：一种面向智能体经验学习的“执行-蒸馏-验证”范式

摘要

基于经验驱动的自我进化对于大语言模型（LLM）智能体在开放世界交互中提升能力至关重要。然而，现有的经验学习方法大多依赖单智能体循环，即同一智能体同时负责执行任务、总结结果和决定记忆内容。这种设定使智能体容易陷入“自我确认陷阱”：错误但自洽的轨迹被误判为成功经验，导致在检索和复用过程中累积错误。为了解决这一问题，我们提出EDV（执行-蒸馏-验证）框架，用于实现可靠的经验学习。在执行阶段，多个异构智能体并行探索同一任务空间，生成多样化的候选轨迹。在蒸馏阶段，一个专门的第三方智能体对这些轨迹进行对比分析，生成候选经验，从而减少以执行者为中心的总结偏差。在验证阶段，执行组通过共识机制验证候选经验，只有通过验证的经验才会被写入共享或私有记忆。通过解耦这三个阶段，EDV将经验学习从孤立的自我反思转变为协作构建，在记忆插入前过滤错误和噪声内容。我们在三个具有挑战性的长周期基准任务上评估了EDV：tau2-bench、Mind2Web和MMTB。结果表明，EDV持续优于强基线方法，验证了可靠的经验构建对于鲁棒的智能体自我进化至关重要。我们的代码开源在 https://github.com/shidingz/EDV。

English

Experience-driven self-evolution is critical for large language model (LLM) agents to improve through open-world interaction. However, existing experience learning methods mostly rely on single-agent loops, where the same agent executes tasks, summarizes outcomes, and determines memory content. This setup makes agents vulnerable to the Self-Confirmation Trap: wrong-but-self-consistent trajectories are misidentified as successful experience, leading to cumulative errors during retrieval and reuse. To address this issue, we propose EDV, an Execute-Distill-Verify framework for reliable experience learning. In the Execute stage, multiple heterogeneous agents explore the same task space in parallel to generate diverse candidate trajectories. In the Distill stage, a dedicated third-party agent comparatively analyzes these trajectories to produce candidate experiences, reducing executor-centric summarization bias. In the Verify stage, the execution group validates candidates via a consensus mechanism, and only approved experiences are written into shared or private memory. By decoupling the three stages, EDV transforms experience learning from isolated self-reflection into collaborative construction, filtering erroneous and noisy content before memory insertion. We evaluate EDV on three challenging long-horizon benchmarks: tau2-bench, Mind2Web and MMTB. Results show EDV consistently outperforms strong baselines, validating that reliable experience construction is essential for robust agent self-evolution. Our code is available at https://github.com/shidingz/EDV.