自己確証の罠からの脱出：エージェント的経験学習のための実行・抽出・検証パラダイム

要旨

経験駆動型の自己進化は、大規模言語モデル（LLM）エージェントがオープンワールドインタラクションを通じて改善するために不可欠である。しかしながら、既存の経験学習手法の多くは単一エージェントループに依存しており、同一エージェントがタスクを実行し、結果を要約し、記憶内容を決定する。この設定により、エージェントは自己確証の罠に対して脆弱になる。すなわち、誤っているが自己無撞着な軌跡が成功体験として誤認され、検索・再利用時に累積誤差を生じるのである。この問題に対処するため、我々は信頼性の高い経験学習を実現するフレームワークEDV（Execute-Distill-Verify）を提案する。Execute段階では、複数の異種エージェントが同一タスク空間を並列に探索し、多様な候補軌跡を生成する。Distill段階では、専任の第三者的エージェントがこれらの軌跡を比較分析して候補経験を生成し、実行主体による要約バイアスを低減する。Verify段階では、実行グループがコンセンサスメカニズムを通じて候補を検証し、承認された経験のみが共有メモリまたはプライベートメモリに書き込まれる。3つの段階を分離することで、EDVは経験学習を孤立した自己内省から協調的構築へと変革し、記憶に挿入される前に誤った内容やノイズをフィルタリングする。我々はEDVを、tau2-bench、Mind2Web、MMTBの3つの難易度の高い長期的ベンチマークで評価した。結果は、EDVが強力なベースラインを一貫して上回り、頑健なエージェント自己進化には信頼性の高い経験構築が不可欠であることを実証している。コードはhttps://github.com/shidingz/EDVで公開している。

English

Experience-driven self-evolution is critical for large language model (LLM) agents to improve through open-world interaction. However, existing experience learning methods mostly rely on single-agent loops, where the same agent executes tasks, summarizes outcomes, and determines memory content. This setup makes agents vulnerable to the Self-Confirmation Trap: wrong-but-self-consistent trajectories are misidentified as successful experience, leading to cumulative errors during retrieval and reuse. To address this issue, we propose EDV, an Execute-Distill-Verify framework for reliable experience learning. In the Execute stage, multiple heterogeneous agents explore the same task space in parallel to generate diverse candidate trajectories. In the Distill stage, a dedicated third-party agent comparatively analyzes these trajectories to produce candidate experiences, reducing executor-centric summarization bias. In the Verify stage, the execution group validates candidates via a consensus mechanism, and only approved experiences are written into shared or private memory. By decoupling the three stages, EDV transforms experience learning from isolated self-reflection into collaborative construction, filtering erroneous and noisy content before memory insertion. We evaluate EDV on three challenging long-horizon benchmarks: tau2-bench, Mind2Web and MMTB. Results show EDV consistently outperforms strong baselines, validating that reliable experience construction is essential for robust agent self-evolution. Our code is available at https://github.com/shidingz/EDV.