初级AI科学家及其风险报告:基于基准论文的自主科研探索
Jr. AI Scientist and Its Risk Report: Autonomous Scientific Exploration from a Baseline Paper
November 6, 2025
作者: Atsuyuki Miyai, Mashiro Toyooka, Takashi Otonari, Zaiying Zhao, Kiyoharu Aizawa
cs.AI
摘要
理解当前AI科学家系统的能力与风险,对于确保可信且可持续的AI驱动科研进程、同时维护学术生态系统的完整性至关重要。为此,我们开发了Jr. AI Scientist——一个模拟初级学生研究者核心科研流程的先进自主AI科学家系统:在获得人类导师提供的基线论文后,该系统能分析其局限性,提出改进的创新假设,通过严谨实验进行验证,并撰写成果论文。与以往假定全自动化或仅处理小规模代码的方法不同,Jr. AI Scientist遵循明确的研究流程,利用现代代码智能体处理复杂的多文件实现,最终产出具有科学价值的成果。在评估方面,我们采用AI评审员进行自动化评估、作者主导评估,并向专注于AI驱动科研的Agents4Science平台投稿。结果表明,Jr. AI Scientist生成的论文评审分数优于现有全自动化系统。然而,通过作者评估和Agents4Science评审,我们也发现了当前AI科学家系统直接应用存在的重大局限性与潜在风险,这些将是未来研究的关键挑战。最后,我们全面报告了开发过程中识别的各类风险,希望这些发现能深化学界对AI科学家发展现状与风险的理解。
English
Understanding the current capabilities and risks of AI Scientist systems is
essential for ensuring trustworthy and sustainable AI-driven scientific
progress while preserving the integrity of the academic ecosystem. To this end,
we develop Jr. AI Scientist, a state-of-the-art autonomous AI scientist system
that mimics the core research workflow of a novice student researcher: Given
the baseline paper from the human mentor, it analyzes its limitations,
formulates novel hypotheses for improvement, validates them through rigorous
experimentation, and writes a paper with the results. Unlike previous
approaches that assume full automation or operate on small-scale code, Jr. AI
Scientist follows a well-defined research workflow and leverages modern coding
agents to handle complex, multi-file implementations, leading to scientifically
valuable contributions. For evaluation, we conducted automated assessments
using AI Reviewers, author-led evaluations, and submissions to Agents4Science,
a venue dedicated to AI-driven scientific contributions. The findings
demonstrate that Jr. AI Scientist generates papers receiving higher review
scores than existing fully automated systems. Nevertheless, we identify
important limitations from both the author evaluation and the Agents4Science
reviews, indicating the potential risks of directly applying current AI
Scientist systems and key challenges for future research. Finally, we
comprehensively report various risks identified during development. We hope
these insights will deepen understanding of current progress and risks in AI
Scientist development.