PACEvolve++: 진화 탐색 에이전트를 위한 테스트 시간 학습 개선

초록

대규모 언어 모델은 진화 탐색의 동력이 되었지만, 대부분의 시스템은 고정된 프롬프트 기반 정책에 의존하여 다음 후보를 샘플링합니다. 이는 평가 비용이 높고 학습된 과제별 탐색 동역학에 따라 진전이 좌우되는 실질적인 공학 및 연구 과제에서 적응성을 제한합니다. 본 논문에서는 진화 탐색 에이전트의 테스트 시점 정책 적응을 위한 어드바이저-모델 강화 학습 프레임워크인 PACEvolve++를 소개합니다. PACEvolve++는 전략적 탐색 결정을 구현과 분리합니다. 즉, 학습 가능한 어드바이저가 가설을 생성, 평가 및 선택하고, 더 강력한 프론티어 모델이 선택된 가설을 실행 가능한 후보로 변환합니다. 비정상적 피드백 하에서 어드바이저를 훈련하기 위해, 우리는 진화 과정의 여러 단계에 맞춰 최적화 전략을 적응시키는 단계 적응형 접근법을 제안합니다. 진화 초기에는 집단 상대 피드백을 사용하여 광범위한 탐색 선호도를 학습하고, 이후 보상 격차가 줄어들면 최상위 k 프론티어 기여도를 강조하여 안정적인 정제를 지원합니다. 전문가 병렬 부하 분산, 순차적 추천 및 단백질 적합성 외삽 과제에서 PACEvolve++는 프론티어 모델을 사용한 최신 진화 탐색 프레임워크보다 더 빠른 수렴 속도와 진화 탐색 중 테스트 시점 훈련의 안정화를 달성하여 성능이 우수함을 입증했습니다.

English

Large language models have become drivers of evolutionary search, but most systems rely on a fixed, prompt-elicited policy to sample next candidates. This limits adaptation in practical engineering and research tasks, where evaluations are expensive, and progress depends on learning task-specific search dynamics. We introduce PACEvolve++, an advisor-model reinforcement learning framework for test-time policy adaptation in evolutionary search agents. PACEvolve++ decouples strategic search decisions from implementation: a trainable advisor generates, assesses, and selects hypotheses, while a stronger frontier model translates selected hypotheses into executable candidates. To train the advisor under non-stationary feedback, we propose a phase-adaptive approach that adapts its optimization strategy to different phases of the evolutionary process. Early in evolution, it uses group-relative feedback to learn broad search preferences; later, as reward gaps compress, it emphasizes best-of-k frontier contribution to support stable refinement. Across expert-parallel load balancing, sequential recommendation, and protein fitness extrapolation, PACEvolve++ outperforms the state-of-the-art evolutionary search framework with frontier models, achieving faster convergence and stabilizing test-time training during evolutionary search.

PACEvolve++: 진화 탐색 에이전트를 위한 테스트 시간 학습 개선

PACEvolve++: Improving Test-time Learning for Evolutionary Search Agents

초록

Support