ChatPaper.aiChatPaper

AutoResearchClaw:人類與AI協作下的自我強化自主研究

AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration

May 19, 2026
作者: Jiaqi Liu, Shi Qiu, Mairui Li, Bingzhou Li, Haonian Ji, Siwei Han, Xinyu Ye, Peng Xia, Zihan Dong, Congyu Zhang, Letian Zhang, Guiming Chen, Haoqin Tu, Xinyu Yang, Lu Feng, Xujiang Zhao, Haifeng Chen, Jiawei Zhou, Xiao Wang, Weitong Zhang, Hongtu Zhu, Yun Li, Jieru Mei, Hongliang Fei, Jiaheng Zhang, Linjie Li, Linjun Zhang, Yuyin Zhou, Sheng Wang, Caiming Xiong, James Zou, Zeyu Zheng, Cihang Xie, Mingyu Ding, Huaxiu Yao
cs.AI

摘要

自動化科學發現所需的遠不只是從想法產生論文。真正的研究是迭代的:假說從多個角度受到挑戰,實驗失敗並為下一次嘗試提供資訊,教訓在不同循環中累積。現有的自主研究系統常將此過程建模為線性管道:它們依賴單一代理推理,在執行失敗時停止,且不會在運行間傳遞經驗。我們提出AutoResearchClaw,這是一個基於五種機制的多代理自主研究管道:用於假說生成與結果分析的結構化多代理辯論、具備樞軸/修正決策循環的自修復執行器,將失敗轉化為資訊、可驗證的結果報告以阻止捏造數字與幻覺引用、具七種干預模式的人機協作(從完全自主到逐步監督),以及跨運行演化,將過往錯誤轉化為未來防護。在包含25個主題的實驗階段基準ARC-Bench上,AutoResearchClaw的表現優於AI Scientist v2達54.7%。一項涵蓋七種干預模式的人機協作消融實驗顯示,在高效槓桿決策點進行精準、目標明確的協作,始終優於完全自主與詳盡的逐步監督。我們將AutoResearchClaw定位為研究放大器,用以增強而非取代人類的科學判斷力。程式碼已公開於 https://github.com/aiming-lab/AutoResearchClaw。
English
Automating scientific discovery requires more than generating papers from ideas. Real research is iterative: hypotheses are challenged from multiple perspectives, experiments fail and inform the next attempt, and lessons accumulate across cycles. Existing autonomous research systems often model this process as a linear pipeline: they rely on single-agent reasoning, stop when execution fails, and do not carry experience across runs. We present AutoResearchClaw, a multi-agent autonomous research pipeline built on five mechanisms: structured multi-agent debate for hypothesis generation and result analysis, a self-healing executor with a Pivot/Refine decision loop that transforms failures into information, verifiable result reporting that prevents fabricated numbers and hallucinated citations, human-in-the-loop collaboration with seven intervention modes spanning full autonomy to step-by-step oversight, and cross-run evolution that converts past mistakes into future safeguards. On ARC-Bench, a 25-topic experiment-stage benchmark, AutoResearchClaw outperforms AI Scientist v2 by 54.7%. A human-in-the-loop ablation across seven intervention modes reveals that precise, targeted collaboration at high-leverage decision points consistently outperforms both full autonomy and exhaustive step-by-step oversight. We position AutoResearchClaw as a research amplifier that augments rather than replaces human scientific judgment. Code is available at https://github.com/aiming-lab/AutoResearchClaw.