AI 用於自動研究：路線圖與使用者指南

摘要

AI輔助研究正跨越一個門檻：全自動系統如今能以低至15美元的價格生成研究論文，而長程自主代理則能在極少人為輸入的情況下執行實驗、撰寫草稿，並模擬審查意見。然而，這一生產力前沿卻暴露出更深層的誠信問題：在科學壓力下，即使是前沿的大型語言模型仍會捏造結果、忽略隱藏錯誤，且無法可靠判斷新穎性。本研究將截至2026年4月的發展納入分析，針對AI在完整研究生命週期中的應用，提出端到端的評估，並按四個認識論階段進行劃分：創造（構想生成、文獻回顧、程式碼與實驗、表格與圖表）、寫作（論文寫作）、驗證（同儕審查、答辯與修訂），以及傳播（海報、簡報、影片、社群媒體、專案網頁與互動代理）。我們發現，在可靠輔助與不可靠自主之間存在一個鮮明且依階段而變的界線：AI在結構化、基於檢索及工具輔助的任務中表現優異，但在真正新穎的構想、研究層級的實驗與科學判斷上仍顯脆弱。生成的構想在實施後往往品質下降，研究程式碼遠落後於模式比對基準，而端到端自主系統尚未能持續達到頂尖會議的接受標準。我們進一步指出，更高的自動化可能掩蓋而非消除失敗模式，使得人類主導的協作成為最可靠的部署範式。最後，我們提供結構化的分類法、基準測試集與工具清單、跨階段設計原則，以及一份從業者導向的操作手冊，相關資源均在我們的專案頁面持續更新。

English

AI-assisted research is crossing a threshold: fully automated systems can now generate research papers for as little as $15, while long-horizon agents can execute experiments, draft manuscripts, and simulate critique with minimal human input. Yet this productivity frontier exposes a deeper integrity problem: under scientific pressure, even frontier LLMs still fabricate results, miss hidden errors, and fail to judge novelty reliably. Studying developments through April 2026, we present an end-to-end analysis of AI across the complete research lifecycle, organized into four epistemological phases: Creation (idea generation, literature review, coding & experiments, tables & figures), Writing (paper writing), Validation (peer review, rebuttal & revision), and Dissemination (posters, slides, videos, social media, project pages, and interactive agents). We identify a sharp, stage-dependent boundary between reliable assistance and unreliable autonomy: AI excels at structured, retrieval-grounded, and tool-mediated tasks, but remains fragile for genuinely novel ideas, research-level experiments, and scientific judgment. Generated ideas often degrade after implementation, research code lags far behind pattern-matching benchmarks, and end-to-end autonomous systems have not yet consistently reached major-venue acceptance standards. We further show that greater automation can obscure rather than eliminate failure modes, making human-governed collaboration the most credible deployment paradigm. Finally, we provide a structured taxonomy, benchmark suite, and tool inventory, cross-stage design principles, and a practitioner-oriented playbook, with resources maintained at our project page.