創意到敘事:將研究概念轉化為完整科學敘述的自動化流程
Idea2Story: An Automated Pipeline for Transforming Research Concepts into Complete Scientific Narratives
January 28, 2026
作者: Tengyue Xu, Zhuoyang Qian, Gaoge Liu, Li Ling, Zhentao Zhang, Biao Wu, Shuo Zhang, Ke Lu, Wei Shi, Ziqi Wang, Zheng Feng, Yan Luo, Shu Xu, Yongjin Chen, Zhibo Feng, Zhuo Chen, Bruce Yuan, Harry Wang, Kris Chen
cs.AI
摘要
基於大型語言模型(LLM)的自主科學發現近期取得顯著進展,展現出自動化端到端研究流程的能力。然而,現有系統主要依賴以運行時為中心的執行範式,需反覆在線閱讀、總結和推理大量科學文獻。這種即時計算策略不僅計算成本高昂,受制於上下文窗口限制,還容易導致脆弱的推理過程和幻覺現象。我們提出Idea2Story——一個以預計算驅動的自主科學發現框架,將文獻理解從在線推理轉向離線知識構建。該框架持續收集經同行評審的論文及其審稿反饋,提取核心方法單元,組合可重用的研究模式,並將其組織成結構化的方法論知識圖譜。在運行時,未充分明確的用戶研究意圖會與已確立的研究範式對齊,實現高效檢索和複用高質量研究模式,而非依賴開放式生成和試錯法。通過將研究規劃與執行錨定於預構建的知識圖譜,Idea2Story緩解了LLM的上下文窗口瓶頸,大幅減少了運行時對文獻的重複推理。我們通過定性分析和初步實證研究表明,Idea2Story能生成連貫、方法論紮實且新穎的研究模式,並在端到端環境中產出多個高質量研究範例。這些結果證明,離線知識構建為可靠的自主科學發現提供了具實踐性與可擴展性的基礎。
English
Autonomous scientific discovery with large language model (LLM)-based agents has recently made substantial progress, demonstrating the ability to automate end-to-end research workflows. However, existing systems largely rely on runtime-centric execution paradigms, repeatedly reading, summarizing, and reasoning over large volumes of scientific literature online. This on-the-spot computation strategy incurs high computational cost, suffers from context window limitations, and often leads to brittle reasoning and hallucination. We propose Idea2Story, a pre-computation-driven framework for autonomous scientific discovery that shifts literature understanding from online reasoning to offline knowledge construction. Idea2Story continuously collects peer-reviewed papers together with their review feedback, extracts core methodological units, composes reusable research patterns, and organizes them into a structured methodological knowledge graph. At runtime, underspecified user research intents are aligned to established research paradigms, enabling efficient retrieval and reuse of high-quality research patterns instead of open-ended generation and trial-and-error. By grounding research planning and execution in a pre-built knowledge graph, Idea2Story alleviates the context window bottleneck of LLMs and substantially reduces repeated runtime reasoning over literature. We conduct qualitative analyses and preliminary empirical studies demonstrating that Idea2Story can generate coherent, methodologically grounded, and novel research patterns, and can produce several high-quality research demonstrations in an end-to-end setting. These results suggest that offline knowledge construction provides a practical and scalable foundation for reliable autonomous scientific discovery.