PACEvolve++:改进进化搜索智能体的测试时学习
PACEvolve++: Improving Test-time Learning for Evolutionary Search Agents
May 7, 2026
作者: Minghao Yan, Bo Peng, Benjamin Coleman, Ziqi Chen, Zhouhang Xie, Shuo Chen, Zhankui He, Noveen Sachdeva, Weili Wang, Ed H. Chi, Shivaram Venkataraman, Wang-Cheng Kang, Derek Zhiyuan Cheng, Beidou Wang
cs.AI
摘要
大型语言模型已成为进化搜索的驱动力,但大多数系统依赖固定且由提示引导的策略来采样下一个候选方案。这限制了在实际工程和研究任务中的适应性——这些任务中评估成本高昂,且进展依赖于学习任务特定的搜索动态。我们提出PACEvolve++,一种基于顾问模型强化学习的框架,用于进化搜索代理的测试时策略适应。PACEvolve++将战略搜索决策与实现解耦:可训练的顾问生成、评估并筛选假设,而更强的前沿模型将筛选出的假设转化为可执行的候选方案。为在非平稳反馈下训练顾问,我们提出一种相位自适应方法,根据进化过程的不同阶段调整优化策略。进化初期,利用群体相对反馈学习广泛的搜索偏好;后期,当奖励差距收窄时,则强调k中最佳前沿贡献以支持稳定优化。在专家并行负载均衡、序列推荐和蛋白质适应性外推任务中,PACEvolve++优于采用前沿模型的最新进化搜索框架,实现了更快的收敛速度,并在进化搜索过程中稳定了测试时训练。
English
Large language models have become drivers of evolutionary search, but most systems rely on a fixed, prompt-elicited policy to sample next candidates. This limits adaptation in practical engineering and research tasks, where evaluations are expensive, and progress depends on learning task-specific search dynamics. We introduce PACEvolve++, an advisor-model reinforcement learning framework for test-time policy adaptation in evolutionary search agents. PACEvolve++ decouples strategic search decisions from implementation: a trainable advisor generates, assesses, and selects hypotheses, while a stronger frontier model translates selected hypotheses into executable candidates. To train the advisor under non-stationary feedback, we propose a phase-adaptive approach that adapts its optimization strategy to different phases of the evolutionary process. Early in evolution, it uses group-relative feedback to learn broad search preferences; later, as reward gaps compress, it emphasizes best-of-k frontier contribution to support stable refinement. Across expert-parallel load balancing, sequential recommendation, and protein fitness extrapolation, PACEvolve++ outperforms the state-of-the-art evolutionary search framework with frontier models, achieving faster convergence and stabilizing test-time training during evolutionary search.