ChatPaper.aiChatPaper

Paper2Code:自动化生成机器学习领域科学论文的代码

Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

April 24, 2025
作者: Minju Seo, Jinheon Baek, Seongyun Lee, Sung Ju Hwang
cs.AI

摘要

尽管机器学习研究发展迅速,但相应的代码实现往往难以获取,这使得研究人员在复现结果和基于前人工作构建时既耗时又费力。与此同时,近期的大型语言模型(LLMs)在理解科学文献和生成高质量代码方面表现出色。受此启发,我们推出了PaperCoder,一个多智能体LLM框架,旨在将机器学习论文转化为功能完备的代码库。PaperCoder的工作流程分为三个阶段:规划阶段,构建高层次路线图,设计系统架构图,识别文件依赖关系并生成配置文件;分析阶段,专注于解读实现细节;生成阶段,产出模块化、依赖感知的代码。此外,每个阶段均由一组专门设计的智能体实例化,这些智能体在整个流程中高效协作。随后,我们基于模型评估和人类评估(特别是来自原论文作者的评估),以作者发布的代码库为基准(如可获得),对PaperCoder从机器学习论文生成代码实现的能力进行了评估。我们的结果表明,PaperCoder在创建高质量、忠实于原作的实现方面效果显著。此外,在最新发布的PaperBench基准测试中,PaperCoder持续展现出优势,以显著差距超越强基线模型。
English
Despite the rapid growth of machine learning research, corresponding code implementations are often unavailable, making it slow and labor-intensive for researchers to reproduce results and build upon prior work. In the meantime, recent Large Language Models (LLMs) excel at understanding scientific documents and generating high-quality code. Inspired by this, we introduce PaperCoder, a multi-agent LLM framework that transforms machine learning papers into functional code repositories. PaperCoder operates in three stages: planning, where it constructs a high-level roadmap, designs the system architecture with diagrams, identifies file dependencies, and generates configuration files; analysis, which focuses on interpreting implementation-specific details; and generation, where modular, dependency-aware code is produced. Moreover, each phase is instantiated through a set of specialized agents designed to collaborate effectively across the pipeline. We then evaluate PaperCoder on generating code implementations from machine learning papers based on both model-based and human evaluations, specifically from the original paper authors, with author-released repositories as ground truth if available. Our results demonstrate the effectiveness of PaperCoder in creating high-quality, faithful implementations. Furthermore, it consistently shows strengths in the recently released PaperBench benchmark, surpassing strong baselines by substantial margins.

Summary

AI-Generated Summary

PDF956April 25, 2025