RankEvolve：基于大语言模型驱动的进化机制实现检索算法自动发现

摘要

儘管BM25和狄利克雷平滑的查詢似然模型等檢索算法仍是高效強健的首階段排序器，但其改進大多依賴於參數調優與人工經驗。本研究探討能否通過評估器引導的大語言模型結合演化搜索，自動發現更優的詞法檢索算法。我們基於AlphaEvolve框架提出RankEvolve程序演化系統：將候選排序算法表示為可執行代碼，並根據其在BEIR與BRIGHT平臺12個IR數據集上的檢索效果進行迭代變異、重組和選擇。RankEvolve以BM25和狄利克雷平滑查詢似然模型作為初始種子程序，最終演化出的算法兼具新穎性與有效性，不僅在完整版BEIR、BRIGHT基準測試中表現優異，還能良好遷移至TREC DL 19/20數據集。實驗結果表明，評估器引導的LLM程序演化是實現排序算法自動發現的可行路徑。

English

Retrieval algorithms like BM25 and query likelihood with Dirichlet smoothing remain strong and efficient first-stage rankers, yet improvements have mostly relied on parameter tuning and human intuition. We investigate whether a large language model, guided by an evaluator and evolutionary search, can automatically discover improved lexical retrieval algorithms. We introduce RankEvolve, a program evolution setup based on AlphaEvolve, in which candidate ranking algorithms are represented as executable code and iteratively mutated, recombined, and selected based on retrieval performance across 12 IR datasets from BEIR and BRIGHT. RankEvolve starts from two seed programs: BM25 and query likelihood with Dirichlet smoothing. The evolved algorithms are novel, effective, and show promising transfer to the full BEIR and BRIGHT benchmarks as well as TREC DL 19 and 20. Our results suggest that evaluator-guided LLM program evolution is a practical path towards automatic discovery of novel ranking algorithms.

RankEvolve：基于大语言模型驱动的进化机制实现检索算法自动发现

RankEvolve: Automating the Discovery of Retrieval Algorithms via LLM-Driven Evolution

摘要

Support