初级AI科学家及其风险报告：基于基准论文的自主科研探索

摘要

理解当前AI科学家系统的能力与风险，对于确保可信且可持续的AI驱动科研进程、同时维护学术生态的完整性至关重要。为此，我们开发了Jr. AI Scientist——一种模拟初级学生研究者核心科研流程的先进自主AI科学家系统：在获得人类导师提供的基线论文后，该系统能分析其局限性，提出改进的创新假设，通过严谨实验进行验证，并撰写成果论文。与以往假定全自动化或仅处理小规模代码的方法不同，Jr. AI Scientist遵循明确的研究流程，并利用现代代码智能体处理复杂的多文件实现，从而产生具有科学价值的成果。在评估方面，我们采用AI评审员进行自动化评估、作者主导的评估以及向专注AI科研贡献的Agents4Science平台投稿。结果表明，Jr. AI Scientist生成的论文获得比现有全自动化系统更高的评审分数。然而，通过作者评估和Agents4Science评审，我们发现了当前AI科学家系统直接应用存在的重大局限性与潜在风险，这些也是未来研究面临的关键挑战。最后，我们全面报告了开发过程中识别的各类风险，希望这些发现能深化对AI科学家发展现状与风险的理解。

English

Understanding the current capabilities and risks of AI Scientist systems is essential for ensuring trustworthy and sustainable AI-driven scientific progress while preserving the integrity of the academic ecosystem. To this end, we develop Jr. AI Scientist, a state-of-the-art autonomous AI scientist system that mimics the core research workflow of a novice student researcher: Given the baseline paper from the human mentor, it analyzes its limitations, formulates novel hypotheses for improvement, validates them through rigorous experimentation, and writes a paper with the results. Unlike previous approaches that assume full automation or operate on small-scale code, Jr. AI Scientist follows a well-defined research workflow and leverages modern coding agents to handle complex, multi-file implementations, leading to scientifically valuable contributions. For evaluation, we conducted automated assessments using AI Reviewers, author-led evaluations, and submissions to Agents4Science, a venue dedicated to AI-driven scientific contributions. The findings demonstrate that Jr. AI Scientist generates papers receiving higher review scores than existing fully automated systems. Nevertheless, we identify important limitations from both the author evaluation and the Agents4Science reviews, indicating the potential risks of directly applying current AI Scientist systems and key challenges for future research. Finally, we comprehensively report various risks identified during development. We hope these insights will deepen understanding of current progress and risks in AI Scientist development.