**最后一篇人类执笔的论文：智能体原生研究范式**

摘要

科学出版物将分支式、迭代式的研究过程压缩为线性叙事，摒弃了研究过程中发现的大部分内容。这种编纂方式带来两种结构性代价：其一是叙事税，即为适应线性叙事而舍弃失败的实验、被推翻的假设及分支探索过程；其二是工程税，即审稿人所需的描述与智能体所需的规范之间存在鸿沟，导致关键实现细节未被记录。这些代价对人类读者尚可容忍，但当AI智能体需要理解、复现并拓展已发表成果时便至关重要。我们提出智能体原生研究制品（ARA）协议，该协议用机器可执行的研究包取代叙事式论文，其结构包含四个层级：科学逻辑层、带完整规范的可执行代码层、保留失败探索路径的探索图谱层，以及将每个论断锚定于原始输出的证据层。该生态系统由三大机制支撑：在常规研发过程中捕获决策与死胡同的实时研究管理器；将传统PDF及代码库转化为ARA的编译器；以及支持客观检查自动化的ARA原生评审系统，使人类评审员能专注于意义、创新性与学术品味评估。在PaperBench和RE-Bench测试中，ARA将问答准确率从72.4%提升至93.7%，复现成功率从57.4%提高至64.4%。在RE-Bench的五项开放式拓展任务中，ARA保留的失败轨迹虽能加速进展，但根据智能体能力差异，也可能限制高能智能体突破既有研究框架。

English

Scientific publication compresses a branching, iterative research process into a linear narrative, discarding the majority of what was discovered along the way. This compilation imposes two structural costs: a Storytelling Tax, where failed experiments, rejected hypotheses, and the branching exploration process are discarded to fit a linear narrative; and an Engineering Tax, where the gap between reviewer-sufficient prose and agent-sufficient specification leaves critical implementation details unwritten. Tolerable for human readers, these costs become critical when AI agents must understand, reproduce, and extend published work. We introduce the Agent-Native Research Artifact (ARA), a protocol that replaces the narrative paper with a machine-executable research package structured around four layers: scientific logic, executable code with full specifications, an exploration graph that preserves the failures compilation discards, and evidence grounding every claim in raw outputs. Three mechanisms support the ecosystem: a Live Research Manager that captures decisions and dead ends during ordinary development; an ARA Compiler that translates legacy PDFs and repos into ARAs; and an ARA-native review system that automates objective checks so human reviewers can focus on significance, novelty, and taste. On PaperBench and RE-Bench, ARA raises question-answering accuracy from 72.4% to 93.7% and reproduction success from 57.4% to 64.4%. On RE-Bench's five open-ended extension tasks, preserved failure traces in ARA accelerate progress, but can also constrain a capable agent from stepping outside the prior-run box depending on the agent's capabilities.

最后一篇人类执笔的论文：智能体原生研究范式

The Last Human-Written Paper: Agent-Native Research Artifacts

摘要

Support