创意成文:将科研构想转化为完整科学叙事的自动化流程
Idea2Story: An Automated Pipeline for Transforming Research Concepts into Complete Scientific Narratives
January 28, 2026
作者: Tengyue Xu, Zhuoyang Qian, Gaoge Liu, Li Ling, Zhentao Zhang, Biao Wu, Shuo Zhang, Ke Lu, Wei Shi, Ziqi Wang, Zheng Feng, Yan Luo, Shu Xu, Yongjin Chen, Zhibo Feng, Zhuo Chen, Bruce Yuan, Harry Wang, Kris Chen
cs.AI
摘要
基于大语言模型的自主科学发现智能体近期取得显著进展,已能实现端到端科研流程的自动化。然而,现有系统主要依赖以运行时为中心的执行范式,需要反复在线阅读、总结和推理海量科学文献。这种即时计算策略不仅计算成本高昂,受限于上下文窗口长度,还容易导致推理脆弱和事实幻觉。我们提出Idea2Story——一种基于预计算驱动的自主科学发现框架,将文献理解从在线推理转变为离线知识构建。该框架持续收集同行评议论文及其审稿反馈,提取核心方法单元,组合可复用的研究模式,并将其组织为结构化方法知识图谱。在运行时,未充分明确的用户研究意图可与既定研究范式对齐,实现高质量研究模式的高效检索与复用,而非开放式生成和试错。通过将研究规划与执行建立在预构建知识图谱之上,Idea2Story有效缓解了大语言模型的上下文窗口瓶颈,大幅减少了对文献的重复运行时推理。定性分析与初步实验表明,Idea2Story能生成连贯、方法可靠且新颖的研究模式,并在端到端场景下产出多个高质量研究范例。这些结果证明,离线知识构建为可靠的自主科学发现提供了实用且可扩展的基础。
English
Autonomous scientific discovery with large language model (LLM)-based agents has recently made substantial progress, demonstrating the ability to automate end-to-end research workflows. However, existing systems largely rely on runtime-centric execution paradigms, repeatedly reading, summarizing, and reasoning over large volumes of scientific literature online. This on-the-spot computation strategy incurs high computational cost, suffers from context window limitations, and often leads to brittle reasoning and hallucination. We propose Idea2Story, a pre-computation-driven framework for autonomous scientific discovery that shifts literature understanding from online reasoning to offline knowledge construction. Idea2Story continuously collects peer-reviewed papers together with their review feedback, extracts core methodological units, composes reusable research patterns, and organizes them into a structured methodological knowledge graph. At runtime, underspecified user research intents are aligned to established research paradigms, enabling efficient retrieval and reuse of high-quality research patterns instead of open-ended generation and trial-and-error. By grounding research planning and execution in a pre-built knowledge graph, Idea2Story alleviates the context window bottleneck of LLMs and substantially reduces repeated runtime reasoning over literature. We conduct qualitative analyses and preliminary empirical studies demonstrating that Idea2Story can generate coherent, methodologically grounded, and novel research patterns, and can produce several high-quality research demonstrations in an end-to-end setting. These results suggest that offline knowledge construction provides a practical and scalable foundation for reliable autonomous scientific discovery.