Intern-Atlas：面向AI科学家的研究方法演进图谱研究基础设施

摘要

现有研究基础设施本质上以文献为中心，虽能提供论文间的引用链接，但缺乏对方法演化的显式表征。尤其未能捕捉那些解释研究方法如何及为何出现、适应并相互借鉴的结构化关系。随着AI驱动的研究代理成为科学知识的新型消费者，这一局限性日益凸显，因为此类代理无法从非结构化文本中可靠地重构方法演化拓扑。我们提出Intern-Atlas——一种方法演化图谱，能自动识别方法级实体、推断方法论间的传承关系，并捕捉驱动连续创新间转换的关键瓶颈。该图谱基于涵盖AI会议、期刊和arXiv预印本的1,030,314篇论文构建，包含9,410,201条具有语义类型的边，每条边均以原文证据为基础，形成可查询的方法发展因果网络。为实现该结构的可操作性，我们进一步提出自引导时序树搜索算法，用于构建追踪方法随时间演进路径的演化链。通过与专家标注的真实演化链进行对比评估，我们发现图谱结果具有高度一致性。此外，我们证明Intern-Atlas可支持创意评估与自动化创意生成等下游应用。我们将方法演化图谱定位为新兴自动化科学发现的基础数据层。

English

Existing research infrastructure is fundamentally document-centric, providing citation links between papers but lacking explicit representations of methodological evolution. In particular, it does not capture the structured relationships that explain how and why research methods emerge, adapt, and build upon one another. With the rise of AI-driven research agents as a new class of consumers of scientific knowledge, this limitation becomes increasingly consequential, as such agents cannot reliably reconstruct method evolution topologies from unstructured text. We introduce Intern-Atlas, a methodological evolution graph that automatically identifies method-level entities, infers lineage relationships among methodologies, and captures the bottlenecks that drive transitions between successive innovations. Built from 1,030,314 papers spanning AI conferences, journals, and arXiv preprints, the resulting graph comprises 9,410,201 semantically typed edges, each grounded in verbatim source evidence, forming a queryable causal network of methodological development. To operationalize this structure, we further propose a self-guided temporal tree search algorithm for constructing evolution chains that trace the progression of methods over time. We evaluate the quality of the resulting graph against expert-curated ground-truth evolution chains and observe strong alignment. In addition, we demonstrate that Intern-Atlas enables downstream applications in idea evaluation and automated idea generation. We position methodological evolution graphs as a foundational data layer for the emerging automated scientific discovery.