ChatPaper.aiChatPaper

利用双向进化搜索的自我改进语言模型

Self-Improving Language Models with Bidirectional Evolutionary Search

May 27, 2026
作者: Guowei Xu, Zhenting Qi, Huangyuan Su, Weirui Ye, Himabindu Lakkaraju, Sham M. Kakade, Yilun Du
cs.AI

摘要

搜索已被提出作为一种有效的方法,用于自我改进的语言模型和智能体系统,既可用于训练后样本生成,也可用于推理。然而,广泛使用的方法(如最佳N采样和树搜索)面临两个根本性局限:它们由稀疏的验证信号引导,并且主要通过自回归扩展构建候选解,从而将探索限制在模型概率质量较高的区域内。为解决这些问题,我们提出了双向进化搜索(BES),这是一个将前向候选解进化与后向目标分解相结合的搜索框架。在前向搜索中,BES通过进化算子增强标准扩展,这些算子重组部分轨迹以生成难以通过单次模型展开获得的候选解。在后向搜索中,BES递归地将原始任务分解为可检查的子目标,产生密集的中间反馈以指导前向搜索。我们提供了理论动机,表明仅通过扩展搜索生成的候选解被限制在狭窄的熵壳内,而进化算子可以逃离该熵壳,并且后向搜索可以指数级减少找到正确答案所需的样本数量。实验表明,在主流训练后算法无法改进的具有挑战性的训练后任务上,BES能够实现持续的增益;在推理时的三个开放式问题求解基准测试中,BES在平均性能和最佳性能方面均优于现有的开源框架。代码和训练好的模型可在 https://github.com/Embodied-Minds-Lab/BES 获取。
English
Search has been proposed as an effective method for self-improving language models and agentic systems, both for post-training sample generation and for inference. However, widely used methods such as best-of-N sampling and tree search face two fundamental limitations: they are guided by sparse verification signals, and they construct candidates primarily through autoregressive expansion, restricting exploration to regions with substantial model probability mass. To address these, we propose Bidirectional Evolutionary Search (BES), a search framework that couples forward candidate evolution with backward goal decomposition. In the forward search, BES augments standard expansion with evolution operators that recombine partial trajectories to generate candidates that are difficult to obtain from a single model rollout. In the backward search, BES recursively decomposes the original task into checkable subgoals, producing dense intermediate feedback that guides forward search. We provide theoretical motivation showing that candidates generated by expansion-only search are confined to a narrow entropy shell while evolutionary operators can escape it, and that backward search can exponentially reduce the number of required samples to find a correct answer. Experiments show that on challenging post-training tasks where mainstream post-training algorithms fail to improve, BES enables consistent gains, and on three open problem solving benchmarks at inference time, BES outperforms existing open-source frameworks in both average and best-case performance. Code and trained models are available at https://github.com/Embodied-Minds-Lab/BES.