ChatPaper.aiChatPaper

**最后一篇人类执笔的论文:智能体原生研究范式**

The Last Human-Written Paper: Agent-Native Research Artifacts

April 29, 2026
作者: Jiachen Liu, Jiaxin Pei, Jintao Huang, Chenglei Si, Ao Qu, Xiangru Tang, Runyu Lu, Lichang Chen, Xiaoyan Bai, Haizhong Zheng, Carl Chen, Zhiyang Chen, Haojie Ye, Yujuan Fu, Zexue He, Zijian Jin, Zhenyu Zhang, Shangquan Sun, Maestro Harmon, John Dianzhuo Wang, Jianqiao Zeng, Jiachen Sun, Mingyuan Wu, Baoyu Zhou, Chenyu You, Shijian Lu, Yiming Qiu, Fan Lai, Yuan Yuan, Yao Li, Junyuan Hong, Ruihao Zhu, Beidi Chen, Alex Pentland, Ang Chen, Mosharaf Chowdhury, Zechen Zhang
cs.AI

摘要

科学出版物将分支式、迭代式的研究过程压缩为线性叙事,摒弃了研究过程中发现的大部分内容。这种编纂方式带来两种结构性代价:其一是叙事税,即为适应线性叙事而舍弃失败的实验、被推翻的假设及分支探索过程;其二是工程税,即审稿人所需的描述与智能体所需的规范之间存在鸿沟,导致关键实现细节未被记录。这些代价对人类读者尚可容忍,但当AI智能体需要理解、复现并拓展已发表成果时便至关重要。我们提出智能体原生研究制品(ARA)协议,该协议用机器可执行的研究包取代叙事式论文,其结构包含四个层级:科学逻辑层、带完整规范的可执行代码层、保留失败探索路径的探索图谱层,以及将每个论断锚定于原始输出的证据层。该生态系统由三大机制支撑:在常规研发过程中捕获决策与死胡同的实时研究管理器;将传统PDF及代码库转化为ARA的编译器;以及支持客观检查自动化的ARA原生评审系统,使人类评审员能专注于意义、创新性与学术品味评估。在PaperBench和RE-Bench测试中,ARA将问答准确率从72.4%提升至93.7%,复现成功率从57.4%提高至64.4%。在RE-Bench的五项开放式拓展任务中,ARA保留的失败轨迹虽能加速进展,但根据智能体能力差异,也可能限制高能智能体突破既有研究框架。
English
Scientific publication compresses a branching, iterative research process into a linear narrative, discarding the majority of what was discovered along the way. This compilation imposes two structural costs: a Storytelling Tax, where failed experiments, rejected hypotheses, and the branching exploration process are discarded to fit a linear narrative; and an Engineering Tax, where the gap between reviewer-sufficient prose and agent-sufficient specification leaves critical implementation details unwritten. Tolerable for human readers, these costs become critical when AI agents must understand, reproduce, and extend published work. We introduce the Agent-Native Research Artifact (ARA), a protocol that replaces the narrative paper with a machine-executable research package structured around four layers: scientific logic, executable code with full specifications, an exploration graph that preserves the failures compilation discards, and evidence grounding every claim in raw outputs. Three mechanisms support the ecosystem: a Live Research Manager that captures decisions and dead ends during ordinary development; an ARA Compiler that translates legacy PDFs and repos into ARAs; and an ARA-native review system that automates objective checks so human reviewers can focus on significance, novelty, and taste. On PaperBench and RE-Bench, ARA raises question-answering accuracy from 72.4% to 93.7% and reproduction success from 57.4% to 64.4%. On RE-Bench's five open-ended extension tasks, preserved failure traces in ARA accelerate progress, but can also constrain a capable agent from stepping outside the prior-run box depending on the agent's capabilities.
PDF51May 2, 2026