AutoResearchClaw：基于人机协作的自我强化自主研究

摘要

自动化科学发现不仅仅是根据想法生成论文。真正的研究是迭代的：假设从多个角度受到挑战，实验失败并指导下一次尝试，知识在循环中不断积累。现有的自主研究系统通常将这一过程建模为线性流水线：它们依赖单智能体推理，在执行失败时停止，且不将经验跨轮次传递。我们提出AutoResearchClaw，一个基于五种机制的多智能体自主研究管线：用于假设生成和结果分析的结构化多智能体辩论；配备Pivot/Refine决策循环、将失败转化为信息的自愈执行器；防止虚构数字和幻觉引用的可验证结果报告；具有七种干预模式、涵盖从完全自主到逐步监督的人机协作机制；以及将过去错误转化为未来保障的跨轮演化。在包含25个主题的实验阶段基准测试ARC-Bench上，AutoResearchClaw比AI Scientist v2高出54.7%。通过七种干预模式的人机消融实验表明，在关键杠杆决策点上的精准定向协作始终优于完全自主和详尽逐步监督。我们将AutoResearchClaw定位为增强而非替代人类科学判断的研究放大器。代码地址：https://github.com/aiming-lab/AutoResearchClaw。

English

Automating scientific discovery requires more than generating papers from ideas. Real research is iterative: hypotheses are challenged from multiple perspectives, experiments fail and inform the next attempt, and lessons accumulate across cycles. Existing autonomous research systems often model this process as a linear pipeline: they rely on single-agent reasoning, stop when execution fails, and do not carry experience across runs. We present AutoResearchClaw, a multi-agent autonomous research pipeline built on five mechanisms: structured multi-agent debate for hypothesis generation and result analysis, a self-healing executor with a Pivot/Refine decision loop that transforms failures into information, verifiable result reporting that prevents fabricated numbers and hallucinated citations, human-in-the-loop collaboration with seven intervention modes spanning full autonomy to step-by-step oversight, and cross-run evolution that converts past mistakes into future safeguards. On ARC-Bench, a 25-topic experiment-stage benchmark, AutoResearchClaw outperforms AI Scientist v2 by 54.7%. A human-in-the-loop ablation across seven intervention modes reveals that precise, targeted collaboration at high-leverage decision points consistently outperforms both full autonomy and exhaustive step-by-step oversight. We position AutoResearchClaw as a research amplifier that augments rather than replaces human scientific judgment. Code is available at https://github.com/aiming-lab/AutoResearchClaw.