AutoResearchClaw: 人間-AI協調による自己強化型自律研究

要旨

科学的発見の自動化には、アイデアから論文を生成する以上のことが必要である。実際の研究は反復的であり、仮説は複数の視点から検証され、実験は失敗して次の試行に情報を提供し、教訓はサイクルを超えて蓄積される。既存の自律的研究システムは、このプロセスを単一エージェントの推論に依存し、実行が失敗すると停止し、経験を実行間で引き継がない線形パイプラインとしてモデル化することが多い。本論文では、仮説生成と結果分析のための構造化マルチエージェント議論、失敗を情報に変換するPivot/Refine決定ループを備えた自己修復実行器、捏造された数値や幻覚引用（ハルシネーション）を防止する検証可能な結果報告、完全自律からステップごとの監視に至る7つの介入モードを備えた人間参加型（ヒューマン・イン・ザ・ループ）協調、および過去の誤りを将来の保護策に変換する実行間進化という5つのメカニズムに基づくマルチエージェント自律研究パイプラインであるAutoResearchClawを提案する。25トピックからなる実験段階ベンチマークであるARC-Benchにおいて、AutoResearchClawはAI Scientist v2を54.7%上回る性能を示した。7つの介入モードにわたる人間参加型のアブレーション実験により、高レバレッジな決定点における精密で標的を絞った協調が、完全自律および網羅的なステップごとの監視の両方を一貫して上回ることが明らかになった。我々はAutoResearchClawを、人間の科学的判断を置き換えるのではなく拡張する研究増幅器として位置づける。コードはhttps://github.com/aiming-lab/AutoResearchClawで入手可能である。

English

Automating scientific discovery requires more than generating papers from ideas. Real research is iterative: hypotheses are challenged from multiple perspectives, experiments fail and inform the next attempt, and lessons accumulate across cycles. Existing autonomous research systems often model this process as a linear pipeline: they rely on single-agent reasoning, stop when execution fails, and do not carry experience across runs. We present AutoResearchClaw, a multi-agent autonomous research pipeline built on five mechanisms: structured multi-agent debate for hypothesis generation and result analysis, a self-healing executor with a Pivot/Refine decision loop that transforms failures into information, verifiable result reporting that prevents fabricated numbers and hallucinated citations, human-in-the-loop collaboration with seven intervention modes spanning full autonomy to step-by-step oversight, and cross-run evolution that converts past mistakes into future safeguards. On ARC-Bench, a 25-topic experiment-stage benchmark, AutoResearchClaw outperforms AI Scientist v2 by 54.7%. A human-in-the-loop ablation across seven intervention modes reveals that precise, targeted collaboration at high-leverage decision points consistently outperforms both full autonomy and exhaustive step-by-step oversight. We position AutoResearchClaw as a research amplifier that augments rather than replaces human scientific judgment. Code is available at https://github.com/aiming-lab/AutoResearchClaw.