스티브-에볼빙: 세분화된 진단과 이중 트랙 지식 증류를 통한 오픈 월드 구현형 자기 진화

초록

오픈 월드 구현 에이전트는 단일 단계 계획의 질이 아닌 상호작용 경험이 어떻게 조직되고 진화하는지가 주요 병목 현상인 장기간 과업을 해결해야 합니다. 이를 위해 우리는 세분화된 실행 진단과 이중 추적 지식 증류를 폐쇄 루프 내에서 긴밀하게 결합하는 비모수적 자기 진화 프레임워크인 Steve-Evolving을 제시합니다. 본 방법론은 경험 정착, 경험 증류, 지식 주도 폐쇄 루프 제어의 세 단계를 따릅니다. 구체적으로, 경험 정착은 각 하위 목표 시도를 고정 스키마(사전 상태, 행동, 진단-결과, 사후 상태)를 가진 구조화된 경험 튜플로 공고히 하고, 다차원 인덱스(예: 조건 서명, 공간 해싱, 의미론적 태그)와 효율적이고 검증 가능한 회상을 위한 롤링 요약을 통해 3계층 경험 공간에 조직화합니다. 귀속을 위한 충분한 정보 밀도를 보장하기 위해 실행 계층은 이진 결과를 넘어 상태 차이 요약, 열거된 실패 원인, 연속 지표, 정체/루프 감지를 포함하는 구성적 진단 신호를 제공합니다. 더 나아가, 경험 증류의 성공적 궤적은 명시적 선행 조건과 검증 기준을 가진 재사용 가능한 스킬로 일반화되는 반면, 실패는 근본 원인을 포착하고 하위 목표 및 과업 단위로 위험 작업을 금지하는 실행 가능한 가드레일로 증류됩니다. 또한, 지식 주도 폐쇄 루프 제어에서 검색된 스킬과 가드레일은 LLM 플래너에 주입되며, 진단에 의해 촉발된 지역 재계획이 활성 제약을 온라인으로 업데이트하여 모델 매개변수 업데이트 없이 지속적인 진화 과정을 형성합니다. Minecraft MCU의 장기간 과업 스위트에 대한 실험은 정적 검증 기반선 대비 지속적인 성능 향상을 입증합니다.

English

Open-world embodied agents must solve long-horizon tasks where the main bottleneck is not single-step planning quality but how interaction experience is organized and evolved. To this end, we present Steve-Evolving, a non-parametric self-evolving framework that tightly couples fine-grained execution diagnosis with dual-track knowledge distillation in a closed loop. The method follows three phases: Experience Anchoring, Experience Distillation, and Knowledge-Driven Closed-Loop Control. In detail, Experience Anchoring solidifies each subgoal attempt into a structured experience tuple with a fixed schema (pre-state, action, diagnosis-result, and post-state) and organizes it in a three-tier experience space with multi-dimensional indices (e.g., condition signatures, spatial hashing, and semantic tags) plus rolling summarization for efficient and auditable recall. To ensure sufficient information density for attribution, the execution layer provides compositional diagnosis signals beyond binary outcomes, including state-difference summaries, enumerated failure causes, continuous indicators, and stagnation/loop detection. Moreover, successful trajectories of Experience Distillation are generalized into reusable skills with explicit preconditions and verification criteria, while failures are distilled into executable guardrails that capture root causes and forbid risky operations at both subgoal and task granularities. Besides, Knowledge-Driven Closed-Loop Control retrieved skills and guardrails are injected into an LLM planner, and diagnosis-triggered local replanning updates the active constraints online, forming a continual evolution process without any model parameter updates. Experiments on the long-horizon suite of Minecraft MCU demonstrate consistent improvements over static-retrieval baselines.

스티브-에볼빙: 세분화된 진단과 이중 트랙 지식 증류를 통한 오픈 월드 구현형 자기 진화

Steve-Evolving: Open-World Embodied Self-Evolution via Fine-Grained Diagnosis and Dual-Track Knowledge Distillation

초록

Support