ChatPaper.aiChatPaper

AlphaResearch:利用語言模型加速新演算法發現

AlphaResearch: Accelerating New Algorithm Discovery with Language Models

November 11, 2025
作者: Zhaojian Yu, Kaiyue Feng, Yilun Zhao, Shilin He, Xiao-Ping Zhang, Arman Cohan
cs.AI

摘要

大型語言模型在複雜但易於驗證的問題上已取得顯著進展,但在探索未知領域時仍面臨挑戰。本文提出AlphaResearch——一種專為開放式問題設計的自主研究智能體,能夠發現新算法。為協同實現發現過程的可行性與創新性,我們通過結合基於執行的驗證環境與模擬現實同儕評審環境,構建了新型雙重研究環境。AlphaResearch通過迭代運行以下步驟發現新算法:(1)提出新構想(2)在雙重研究環境中驗證構想(3)優化研究方案以提升性能。為推動透明化評估流程,我們建立了AlphaResearchComp基準測試平台,包含八項開放式算法問題競賽,每個問題均通過可執行流程、客觀指標和可重現性檢驗進行精心設計與驗證。在與人類研究者的直接對比中,AlphaResearch取得了2/8的勝率,證明了利用大型語言模型加速算法發現的潛力。值得注意的是,AlphaResearch在「圓形裝填」問題上發現的算法實現了已知最佳性能,超越了人類研究者的成果及近期強基線方法(如AlphaEvolve)。此外,我們針對其餘6/8失敗案例進行了全面分析,為未來研究提供了寶貴見解。
English
Large language models have made significant progress in complex but easy-to-verify problems, yet they still struggle with discovering the unknown. In this paper, we present AlphaResearch, an autonomous research agent designed to discover new algorithms on open-ended problems. To synergize the feasibility and innovation of the discovery process, we construct a novel dual research environment by combining the execution-based verify and simulated real-world peer review environment. AlphaResearch discovers new algorithm by iteratively running the following steps: (1) propose new ideas (2) verify the ideas in the dual research environment (3) optimize the research proposals for better performance. To promote a transparent evaluation process, we construct AlphaResearchComp, a new evaluation benchmark that includes an eight open-ended algorithmic problems competition, with each problem carefully curated and verified through executable pipelines, objective metrics, and reproducibility checks. AlphaResearch gets a 2/8 win rate in head-to-head comparison with human researchers, demonstrate the possibility of accelerating algorithm discovery with LLMs. Notably, the algorithm discovered by AlphaResearch on the ``packing circles'' problem achieves the best-of-known performance, surpassing the results of human researchers and strong baselines from recent work (e.g., AlphaEvolve). Additionally, we conduct a comprehensive analysis of the remaining challenges of the 6/8 failure cases, providing valuable insights for future research.
PDF152December 1, 2025