ChatPaper.aiChatPaper

AlphaResearch:利用语言模型加速新算法发现

AlphaResearch: Accelerating New Algorithm Discovery with Language Models

November 11, 2025
作者: Zhaojian Yu, Kaiyue Feng, Yilun Zhao, Shilin He, Xiao-Ping Zhang, Arman Cohan
cs.AI

摘要

大型语言模型在复杂但易于验证的问题上已取得显著进展,但在探索未知领域方面仍存在困难。本文提出AlphaResearch——一种专为在开放性问题中发现新算法而设计的自主研究智能体。为协同实现发现过程的可行性与创新性,我们通过结合基于执行的验证环境与模拟现实同行评审环境,构建了新型双重研究环境。AlphaResearch通过迭代运行以下步骤发现新算法:(1)提出新思路(2)在双重研究环境中验证思路(3)优化研究方案以提升性能。为推进透明化评估进程,我们构建了AlphaResearchComp评测基准,包含八项开放型算法问题的竞赛,每个问题均通过可执行流程、客观指标和可复现性检验进行精心设计与验证。在与人类研究者的直接对比中,AlphaResearch取得了2/8的胜率,证明了利用大语言模型加速算法发现的可行性。值得注意的是,AlphaResearch在"圆排列"问题上发现的算法实现了当前最佳性能,超越了人类研究者及近期强基线方法(如AlphaEvolve)的结果。此外,我们对6/8失败案例中的遗留挑战进行了全面分析,为未来研究提供了宝贵洞见。
English
Large language models have made significant progress in complex but easy-to-verify problems, yet they still struggle with discovering the unknown. In this paper, we present AlphaResearch, an autonomous research agent designed to discover new algorithms on open-ended problems. To synergize the feasibility and innovation of the discovery process, we construct a novel dual research environment by combining the execution-based verify and simulated real-world peer review environment. AlphaResearch discovers new algorithm by iteratively running the following steps: (1) propose new ideas (2) verify the ideas in the dual research environment (3) optimize the research proposals for better performance. To promote a transparent evaluation process, we construct AlphaResearchComp, a new evaluation benchmark that includes an eight open-ended algorithmic problems competition, with each problem carefully curated and verified through executable pipelines, objective metrics, and reproducibility checks. AlphaResearch gets a 2/8 win rate in head-to-head comparison with human researchers, demonstrate the possibility of accelerating algorithm discovery with LLMs. Notably, the algorithm discovered by AlphaResearch on the ``packing circles'' problem achieves the best-of-known performance, surpassing the results of human researchers and strong baselines from recent work (e.g., AlphaEvolve). Additionally, we conduct a comprehensive analysis of the remaining challenges of the 6/8 failure cases, providing valuable insights for future research.
PDF152December 1, 2025