Paper2Code：自动化生成机器学习领域科学论文的代码

摘要

尽管机器学习研究发展迅速，但相应的代码实现往往难以获取，这使得研究人员在复现结果和基于前人工作构建时既耗时又费力。与此同时，近期的大型语言模型（LLMs）在理解科学文献和生成高质量代码方面表现出色。受此启发，我们推出了PaperCoder，一个多智能体LLM框架，旨在将机器学习论文转化为功能完备的代码库。PaperCoder的工作流程分为三个阶段：规划阶段，构建高层次路线图，设计系统架构图，识别文件依赖关系并生成配置文件；分析阶段，专注于解读实现细节；生成阶段，产出模块化、依赖感知的代码。此外，每个阶段均由一组专门设计的智能体实例化，这些智能体在整个流程中高效协作。随后，我们基于模型评估和人类评估（特别是来自原论文作者的评估），以作者发布的代码库为基准（如可获得），对PaperCoder从机器学习论文生成代码实现的能力进行了评估。我们的结果表明，PaperCoder在创建高质量、忠实于原作的实现方面效果显著。此外，在最新发布的PaperBench基准测试中，PaperCoder持续展现出优势，以显著差距超越强基线模型。

English

Despite the rapid growth of machine learning research, corresponding code implementations are often unavailable, making it slow and labor-intensive for researchers to reproduce results and build upon prior work. In the meantime, recent Large Language Models (LLMs) excel at understanding scientific documents and generating high-quality code. Inspired by this, we introduce PaperCoder, a multi-agent LLM framework that transforms machine learning papers into functional code repositories. PaperCoder operates in three stages: planning, where it constructs a high-level roadmap, designs the system architecture with diagrams, identifies file dependencies, and generates configuration files; analysis, which focuses on interpreting implementation-specific details; and generation, where modular, dependency-aware code is produced. Moreover, each phase is instantiated through a set of specialized agents designed to collaborate effectively across the pipeline. We then evaluate PaperCoder on generating code implementations from machine learning papers based on both model-based and human evaluations, specifically from the original paper authors, with author-released repositories as ground truth if available. Our results demonstrate the effectiveness of PaperCoder in creating high-quality, faithful implementations. Furthermore, it consistently shows strengths in the recently released PaperBench benchmark, surpassing strong baselines by substantial margins.

Paper2Code：自动化生成机器学习领域科学论文的代码

Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

摘要

Support