자기확증의 함정에서 벗어나기: 에이전틱 경험 학습을 위한 실행-추출-검증 패러다임

초록

경험 기반 자기 진화는 대규모 언어 모델(LLM) 에이전트가 개방형 세계 상호작용을 통해 개선하는 데 중요하다. 그러나 기존의 경험 학습 방법은 대부분 단일 에이전트 루프에 의존하는데, 이는 동일한 에이전트가 작업을 실행하고, 결과를 요약하며, 메모리 내용을 결정하는 구조이다. 이러한 설정은 에이전트가 자기 확증 함정(Self-Confirmation Trap)에 취약하게 만든다. 즉, 틀렸지만 자기 일관적인 궤적이 성공적인 경험으로 잘못 식별되어 검색 및 재사용 시 누적 오류를 초래한다. 이 문제를 해결하기 위해 우리는 신뢰할 수 있는 경험 학습을 위한 실행-증류-검증(Execute-Distill-Verify) 프레임워크인 EDV를 제안한다. 실행(Execute) 단계에서는 여러 이질적 에이전트가 동일한 작업 공간을 병렬로 탐색하여 다양한 후보 궤적을 생성한다. 증류(Distill) 단계에서는 전용 제3자 에이전트가 이러한 궤적을 비교 분석하여 후보 경험을 생성함으로써 실행자 중심의 요약 편향을 줄인다. 검증(Verify) 단계에서는 실행 그룹이 합의 메커니즘을 통해 후보를 검증하고, 승인된 경험만 공유 또는 개인 메모리에 기록된다. 세 단계를 분리함으로써 EDV는 경험 학습을 고립된 자기 성찰에서 협력적 구축으로 전환하여, 메모리 삽입 전에 오류 및 잡음 콘텐츠를 필터링한다. 우리는 EDV를 세 가지 도전적인 장기 벤치마크(tau2-bench, Mind2Web, MMTB)에서 평가한다. 결과는 EDV가 강력한 기준선을 일관되게 능가함을 보여주며, 신뢰할 수 있는 경험 구축이 강건한 에이전트 자기 진화에 필수적임을 입증한다. 우리의 코드는 https://github.com/shidingz/EDV에서 확인할 수 있다.

English

Experience-driven self-evolution is critical for large language model (LLM) agents to improve through open-world interaction. However, existing experience learning methods mostly rely on single-agent loops, where the same agent executes tasks, summarizes outcomes, and determines memory content. This setup makes agents vulnerable to the Self-Confirmation Trap: wrong-but-self-consistent trajectories are misidentified as successful experience, leading to cumulative errors during retrieval and reuse. To address this issue, we propose EDV, an Execute-Distill-Verify framework for reliable experience learning. In the Execute stage, multiple heterogeneous agents explore the same task space in parallel to generate diverse candidate trajectories. In the Distill stage, a dedicated third-party agent comparatively analyzes these trajectories to produce candidate experiences, reducing executor-centric summarization bias. In the Verify stage, the execution group validates candidates via a consensus mechanism, and only approved experiences are written into shared or private memory. By decoupling the three stages, EDV transforms experience learning from isolated self-reflection into collaborative construction, filtering erroneous and noisy content before memory insertion. We evaluate EDV on three challenging long-horizon benchmarks: tau2-bench, Mind2Web and MMTB. Results show EDV consistently outperforms strong baselines, validating that reliable experience construction is essential for robust agent self-evolution. Our code is available at https://github.com/shidingz/EDV.