自动研究的人工智能：路线图与用户指南

摘要

AI辅助研究正迈过一道门槛：如今，全自动化系统能以低至15美元的成本生成研究论文，而长周期智能体可在极少人工干预下执行实验、起草稿件并模拟评审。然而，这一生产力前沿暴露了更深层的诚信问题：在科研压力下，即使是前沿大语言模型仍会编造结果、遗漏隐藏错误，且难以可靠判断创新性。基于截至2026年4月的发展研究，我们提出对AI在完整研究生命周期中的端到端分析，按四个认知阶段组织：创造（想法生成、文献综述、编码与实验、表格与图表）、写作（论文撰写）、验证（同行评审、反驳与修改）以及传播（海报、幻灯片、视频、社交媒体、项目页面与交互智能体）。我们发现可靠辅助与不可靠自主之间存在尖锐的、阶段依赖性界限：AI在结构化、基于检索及工具辅助的任务中表现出色，但在真正新颖的想法、研究级实验及科学判断方面仍显脆弱。生成的想法在实施后常会退化，研究代码远落后于模式匹配基准，端到端自主系统尚未持续达到主要会议录用的标准。我们进一步表明，更高的自动化可能掩盖而非消除失败模式，因此以人为治理的协作成为最可信的部署范式。最后，我们提供结构化的分类体系、基准套件与工具清单、跨阶段设计原则，以及面向实践者的操作指南，相关资源维护于我们的项目页面。

English

AI-assisted research is crossing a threshold: fully automated systems can now generate research papers for as little as $15, while long-horizon agents can execute experiments, draft manuscripts, and simulate critique with minimal human input. Yet this productivity frontier exposes a deeper integrity problem: under scientific pressure, even frontier LLMs still fabricate results, miss hidden errors, and fail to judge novelty reliably. Studying developments through April 2026, we present an end-to-end analysis of AI across the complete research lifecycle, organized into four epistemological phases: Creation (idea generation, literature review, coding & experiments, tables & figures), Writing (paper writing), Validation (peer review, rebuttal & revision), and Dissemination (posters, slides, videos, social media, project pages, and interactive agents). We identify a sharp, stage-dependent boundary between reliable assistance and unreliable autonomy: AI excels at structured, retrieval-grounded, and tool-mediated tasks, but remains fragile for genuinely novel ideas, research-level experiments, and scientific judgment. Generated ideas often degrade after implementation, research code lags far behind pattern-matching benchmarks, and end-to-end autonomous systems have not yet consistently reached major-venue acceptance standards. We further show that greater automation can obscure rather than eliminate failure modes, making human-governed collaboration the most credible deployment paradigm. Finally, we provide a structured taxonomy, benchmark suite, and tool inventory, cross-stage design principles, and a practitioner-oriented playbook, with resources maintained at our project page.