AutoResearch AI：科学的発見のためのAI駆動型研究自動化に向けて

要旨

科学研究は、文献に基づく基礎付け、仮説生成、実験、検証、報告、改訂にわたる長期的なワークフローへと進む、孤立した支援を超えたAIシステムによって再形成されつつある。この変化は、科学のためのタスクレベルのAIからワークフローレベルの研究自動化への移行を示している。しかし、現在のシステムは断片化されたままであり、自律性、ドメイン範囲、実行環境、検証メカニズム、人間による監視において異なり、依然として証拠の保存、再現性、弱方向性の棄却、来歴追跡、横断領域のロバスト性、説明責任のある科学的完結に苦慮している。本サーベイは、これらの発展を、AI駆動の科学研究ワークフロー自動化の発展スペクトルとして定義されるオートリサーチ（AutoResearch）を通じて考察する。その中で、バイブリサーチ（Vibe Research）は、プロンプトベースの支援と人間による検証済み実行という人間主導の領域を指し、一方、新興のAI主導システムは、頑健な自律性を達成することなく、発見ループのより大きな部分を調整する。我々は、研究システムがワークフロー全体で制御、証拠、実行、検証、説明責任をどのように再配分するかを分析し、文献および研究の基礎付け、仮説形成と計画、実験とツール使用、フィードバック・検証・レビュー、報告と知識伝達の5つのワークフロー条件に基づいて分野を整理する。さらに、AI科学者システム、混合主導型共同研究フレームワーク、ベンチマーク、ドメイン展開、オープンソース基盤を総合する。最後に、新規性、妥当性、影響力、信頼性、来歴の5つの評価次元を提案し、オートリサーチの自律性はドメインに条件付けられており、構造化され実行可能で迅速に検証可能な設定ではより信頼できるが、具現化された、遅延のある、異種混在の、倫理的、または制度的に説明責任のあるコンテキストでは限定的であることを示す。

English

Scientific research is being reshaped by AI systems that move beyond isolated assistance toward longer-horizon workflows spanning literature grounding, hypothesis generation, experimentation, validation, reporting, and revision. This shift marks a transition from task-level AI for science to workflow-level research automation. Yet current systems remain fragmented, differing in autonomy, domain scope, execution environment, validation mechanism, and human oversight, while still struggling with evidence preservation, reproducibility, weak-direction rejection, provenance tracking, cross-domain robustness, and accountable scientific closure. This survey examines these developments through AutoResearch, defined as the developmental spectrum of AI-powered scientific workflow automation. Within it, Vibe Research denotes the human-steered region of prompt-based assistance and human-verified execution, whereas emerging AI-led systems coordinate larger portions of the discovery loop without achieving robust autonomy. We analyze how research systems redistribute control, evidence, execution, validation, and accountability across workflows and organize the field around five workflow conditions: literature and research grounding; hypothesis formation and planning; experimentation and tool use; feedback, validation, and review; and reporting and knowledge communication. We further synthesize AI scientist systems, mixed-initiative co-research frameworks, benchmarks, domain deployments, and open-source infrastructures. Finally, we propose five evaluation dimensions--novelty, validity, impact, reliability, and provenance--and show that AutoResearch autonomy is domain-conditioned, being more credible in structured, executable, and rapidly verifiable settings but limited in embodied, delayed, heterogeneous, ethical, or institutionally accountable contexts.