AutoResearchClaw：人類與AI協作下的自我強化自主研究

摘要

自動化科學發現所需的遠不只是從想法產生論文。真正的研究是迭代的：假說從多個角度受到挑戰，實驗失敗並為下一次嘗試提供資訊，教訓在不同循環中累積。現有的自主研究系統常將此過程建模為線性管道：它們依賴單一代理推理，在執行失敗時停止，且不會在運行間傳遞經驗。我們提出AutoResearchClaw，這是一個基於五種機制的多代理自主研究管道：用於假說生成與結果分析的結構化多代理辯論、具備樞軸/修正決策循環的自修復執行器，將失敗轉化為資訊、可驗證的結果報告以阻止捏造數字與幻覺引用、具七種干預模式的人機協作（從完全自主到逐步監督），以及跨運行演化，將過往錯誤轉化為未來防護。在包含25個主題的實驗階段基準ARC-Bench上，AutoResearchClaw的表現優於AI Scientist v2達54.7%。一項涵蓋七種干預模式的人機協作消融實驗顯示，在高效槓桿決策點進行精準、目標明確的協作，始終優於完全自主與詳盡的逐步監督。我們將AutoResearchClaw定位為研究放大器，用以增強而非取代人類的科學判斷力。程式碼已公開於 https://github.com/aiming-lab/AutoResearchClaw。

English

Automating scientific discovery requires more than generating papers from ideas. Real research is iterative: hypotheses are challenged from multiple perspectives, experiments fail and inform the next attempt, and lessons accumulate across cycles. Existing autonomous research systems often model this process as a linear pipeline: they rely on single-agent reasoning, stop when execution fails, and do not carry experience across runs. We present AutoResearchClaw, a multi-agent autonomous research pipeline built on five mechanisms: structured multi-agent debate for hypothesis generation and result analysis, a self-healing executor with a Pivot/Refine decision loop that transforms failures into information, verifiable result reporting that prevents fabricated numbers and hallucinated citations, human-in-the-loop collaboration with seven intervention modes spanning full autonomy to step-by-step oversight, and cross-run evolution that converts past mistakes into future safeguards. On ARC-Bench, a 25-topic experiment-stage benchmark, AutoResearchClaw outperforms AI Scientist v2 by 54.7%. A human-in-the-loop ablation across seven intervention modes reveals that precise, targeted collaboration at high-leverage decision points consistently outperforms both full autonomy and exhaustive step-by-step oversight. We position AutoResearchClaw as a research amplifier that augments rather than replaces human scientific judgment. Code is available at https://github.com/aiming-lab/AutoResearchClaw.