基於雙向進化搜索的自我改進語言模型
Self-Improving Language Models with Bidirectional Evolutionary Search
May 27, 2026
作者: Guowei Xu, Zhenting Qi, Huangyuan Su, Weirui Ye, Himabindu Lakkaraju, Sham M. Kakade, Yilun Du
cs.AI
摘要
搜尋已被提出作為語言模型自我改進與代理系統的有效方法,無論是在後訓練樣本生成還是在推理階段。然而,廣泛使用的方法(如最佳N取樣與樹狀搜尋)面臨兩項基本限制:它們由稀疏的驗證信號引導,且主要透過自迴歸擴展來建構候選方案,將探索侷限於模型機率質量集中的區域。為解決這些問題,我們提出雙向演化搜尋(Bidirectional Evolutionary Search, BES),這是一個結合前向候選演化與後向目標分解的搜尋框架。在前向搜尋中,BES 以演化運算子增強標準擴展,這些運算子能重組局部軌跡以產生難以從單一模型生成中獲得的候選方案。在後向搜尋中,BES 遞迴地將原始任務分解為可驗證的子目標,產生密集的中間反饋以引導前向搜尋。我們提供理論動機,顯示僅透過擴展生成的候選方案受限於狹窄的熵殼,而演化運算子能脫離此限制;後向搜尋則能以指數級方式減少找到正確答案所需的樣本數量。實驗結果表明,在主流後訓練演算法無法改善的具挑戰性後訓練任務上,BES 能實現持續增益;而在三個開放式問題求解基準測試的推理階段,BES 在平均表現與最佳表現上均優於現有的開源框架。程式碼與訓練好的模型已公開於 https://github.com/Embodied-Minds-Lab/BES。
English
Search has been proposed as an effective method for self-improving language models and agentic systems, both for post-training sample generation and for inference. However, widely used methods such as best-of-N sampling and tree search face two fundamental limitations: they are guided by sparse verification signals, and they construct candidates primarily through autoregressive expansion, restricting exploration to regions with substantial model probability mass. To address these, we propose Bidirectional Evolutionary Search (BES), a search framework that couples forward candidate evolution with backward goal decomposition. In the forward search, BES augments standard expansion with evolution operators that recombine partial trajectories to generate candidates that are difficult to obtain from a single model rollout. In the backward search, BES recursively decomposes the original task into checkable subgoals, producing dense intermediate feedback that guides forward search. We provide theoretical motivation showing that candidates generated by expansion-only search are confined to a narrow entropy shell while evolutionary operators can escape it, and that backward search can exponentially reduce the number of required samples to find a correct answer. Experiments show that on challenging post-training tasks where mainstream post-training algorithms fail to improve, BES enables consistent gains, and on three open problem solving benchmarks at inference time, BES outperforms existing open-source frameworks in both average and best-case performance. Code and trained models are available at https://github.com/Embodied-Minds-Lab/BES.