CHIMERA: 科学文献におけるアイデア再結合の知識ベース

要旨

人間の革新の特徴は、既存のメカニズムや概念の要素を統合することで独創的なアイデアを生み出す「再結合」のプロセスにある。本研究では、科学文献を自動的に収集し、再結合の事例を大規模に集積した知識ベース（KB）であるCHIMERAを構築した。CHIMERAは、科学者がどのように概念を再結合し、異なる分野からインスピレーションを得ているかを大規模に実証的に探るために利用できるほか、新しい創造的なクロスドメインの方向性を予測する教師あり機械学習モデルの訓練にも使用できる。このKBを構築するために、科学論文のアブストラクトから再結合を抽出する新たな情報抽出タスクを提案し、数百の手動注釈付きアブストラクトからなる高品質なコーパスを収集し、それを用いてLLMベースの抽出モデルを訓練した。このモデルをAI分野の大規模な論文コーパスに適用し、28,000以上の再結合事例を含むKBを生成した。CHIMERAを分析し、AIの異なるサブ領域における再結合の特性を探る。最後に、KBを用いて科学的仮説生成モデルを訓練し、現実世界の研究者がインスピレーションを得る新しい再結合の方向性を予測する。データとコードはhttps://github.cs.huji.ac.il/tomhope-lab/CHIMERAで公開されている。

English

A hallmark of human innovation is the process of recombination -- creating original ideas by integrating elements of existing mechanisms and concepts. In this work, we automatically mine the scientific literature and build CHIMERA: a large-scale knowledge base (KB) of recombination examples. CHIMERA can be used to empirically explore at scale how scientists recombine concepts and take inspiration from different areas, or to train supervised machine learning models that learn to predict new creative cross-domain directions. To build this KB, we present a novel information extraction task of extracting recombination from scientific paper abstracts, collect a high-quality corpus of hundreds of manually annotated abstracts, and use it to train an LLM-based extraction model. The model is applied to a large corpus of papers in the AI domain, yielding a KB of over 28K recombination examples. We analyze CHIMERA to explore the properties of recombination in different subareas of AI. Finally, we train a scientific hypothesis generation model using the KB, which predicts new recombination directions that real-world researchers find inspiring. Our data and code are available at https://github.cs.huji.ac.il/tomhope-lab/CHIMERA

CHIMERA: 科学文献におけるアイデア再結合の知識ベース

CHIMERA: A Knowledge Base of Idea Recombination in Scientific Literature

要旨

Support