PACEvolve++：改进进化搜索智能体的测试时学习

摘要

大型语言模型已成为进化搜索的驱动力，但大多数系统依赖固定且由提示引导的策略来采样下一个候选方案。这限制了在实际工程和研究任务中的适应性——这些任务中评估成本高昂，且进展依赖于学习任务特定的搜索动态。我们提出PACEvolve++，一种基于顾问模型强化学习的框架，用于进化搜索代理的测试时策略适应。PACEvolve++将战略搜索决策与实现解耦：可训练的顾问生成、评估并筛选假设，而更强的前沿模型将筛选出的假设转化为可执行的候选方案。为在非平稳反馈下训练顾问，我们提出一种相位自适应方法，根据进化过程的不同阶段调整优化策略。进化初期，利用群体相对反馈学习广泛的搜索偏好；后期，当奖励差距收窄时，则强调k中最佳前沿贡献以支持稳定优化。在专家并行负载均衡、序列推荐和蛋白质适应性外推任务中，PACEvolve++优于采用前沿模型的最新进化搜索框架，实现了更快的收敛速度，并在进化搜索过程中稳定了测试时训练。

English

Large language models have become drivers of evolutionary search, but most systems rely on a fixed, prompt-elicited policy to sample next candidates. This limits adaptation in practical engineering and research tasks, where evaluations are expensive, and progress depends on learning task-specific search dynamics. We introduce PACEvolve++, an advisor-model reinforcement learning framework for test-time policy adaptation in evolutionary search agents. PACEvolve++ decouples strategic search decisions from implementation: a trainable advisor generates, assesses, and selects hypotheses, while a stronger frontier model translates selected hypotheses into executable candidates. To train the advisor under non-stationary feedback, we propose a phase-adaptive approach that adapts its optimization strategy to different phases of the evolutionary process. Early in evolution, it uses group-relative feedback to learn broad search preferences; later, as reward gaps compress, it emphasizes best-of-k frontier contribution to support stable refinement. Across expert-parallel load balancing, sequential recommendation, and protein fitness extrapolation, PACEvolve++ outperforms the state-of-the-art evolutionary search framework with frontier models, achieving faster convergence and stabilizing test-time training during evolutionary search.

PACEvolve++：改进进化搜索智能体的测试时学习

PACEvolve++: Improving Test-time Learning for Evolutionary Search Agents

摘要

Support