ChatPaper.aiChatPaper

CHIMERA:科學文獻中創意重組的知識庫

CHIMERA: A Knowledge Base of Idea Recombination in Scientific Literature

May 27, 2025
作者: Noy Sternlicht, Tom Hope
cs.AI

摘要

人类创新的一个显著标志是重组过程——通过整合现有机制和概念的元素来创造新颖的想法。在本研究中,我们自动挖掘科学文献,构建了CHIMERA:一个大规模的重组实例知识库(KB)。CHIMERA可用于大规模实证探索科学家如何重组概念并从不同领域汲取灵感,或用于训练监督式机器学习模型,使其学会预测新的跨领域创新方向。为构建此知识库,我们提出了一项新颖的信息抽取任务,即从科学论文摘要中提取重组实例,收集了数百篇高质量的人工标注摘要语料,并利用其训练了一个基于大语言模型(LLM)的抽取模型。该模型应用于人工智能领域的大量论文,生成了包含超过28,000个重组实例的知识库。我们通过分析CHIMERA,探索了人工智能各子领域中重组的特性。最后,我们利用该知识库训练了一个科学假设生成模型,该模型预测了现实世界研究者认为具有启发性的新重组方向。我们的数据和代码可在https://github.cs.huji.ac.il/tomhope-lab/CHIMERA获取。
English
A hallmark of human innovation is the process of recombination -- creating original ideas by integrating elements of existing mechanisms and concepts. In this work, we automatically mine the scientific literature and build CHIMERA: a large-scale knowledge base (KB) of recombination examples. CHIMERA can be used to empirically explore at scale how scientists recombine concepts and take inspiration from different areas, or to train supervised machine learning models that learn to predict new creative cross-domain directions. To build this KB, we present a novel information extraction task of extracting recombination from scientific paper abstracts, collect a high-quality corpus of hundreds of manually annotated abstracts, and use it to train an LLM-based extraction model. The model is applied to a large corpus of papers in the AI domain, yielding a KB of over 28K recombination examples. We analyze CHIMERA to explore the properties of recombination in different subareas of AI. Finally, we train a scientific hypothesis generation model using the KB, which predicts new recombination directions that real-world researchers find inspiring. Our data and code are available at https://github.cs.huji.ac.il/tomhope-lab/CHIMERA

Summary

AI-Generated Summary

PDF143May 29, 2025