Spacer: Op Weg naar Geprogrammeerde Wetenschappelijke Inspiratie

Samenvatting

Recente ontwikkelingen in LLM's hebben geautomatiseerd wetenschappelijk onderzoek tot het volgende front gemaakt op het pad naar kunstmatige superintelligentie. Deze systemen zijn echter beperkt tot taken met een smalle scope of de beperkte creatieve mogelijkheden van LLM's. Wij stellen Spacer voor, een wetenschappelijk ontdekkingssysteem dat creatieve en feitelijk onderbouwde concepten ontwikkelt zonder externe interventie. Spacer probeert dit te bereiken via 'doelbewuste decontextualisatie', een benadering die informatie ontleedt in atomische eenheden - trefwoorden - en creativiteit put uit onontgonnen verbanden daartussen. Spacer bestaat uit (i) Nuri, een inspiratiemotor die trefwoordsets opbouwt, en (ii) de Manifesting Pipeline die deze sets verfijnt tot uitgewerkte wetenschappelijke uitspraken. Nuri extraheert nieuwe, veelbelovende trefwoordsets uit een trefwoordengrafiek opgebouwd met 180.000 academische publicaties in biologische vakgebieden. De Manifesting Pipeline vindt verbanden tussen trefwoorden, analyseert hun logische structuur, valideert hun plausibiliteit en stelt uiteindelijk originele wetenschappelijke concepten op. Volgens onze experimenten classificeert de evaluatiemetriek van Nuri publicaties met grote impact nauwkeurig met een AUROC-score van 0,737. Onze Manifesting Pipeline reconstrueert ook succesvol kernconcepten uit de nieuwste artikelen in topbladen uitsluitend op basis van hun trefwoordsets. Een op LLM gebaseerd scoringssysteem schat dat deze reconstructie in meer dan 85% van de gevallen correct was. Ten slotte toont onze analyse van de embeddingruimte aan dat de uitvoer van Spacer aanzienlijk meer overeenkomt met toonaangevende publicaties in vergelijking met die van state-of-the-art LLM's.

English

Recent advances in LLMs have made automated scientific research the next frontline in the path to artificial superintelligence. However, these systems are bound either to tasks of narrow scope or the limited creative capabilities of LLMs. We propose Spacer, a scientific discovery system that develops creative and factually grounded concepts without external intervention. Spacer attempts to achieve this via 'deliberate decontextualization,' an approach that disassembles information into atomic units - keywords - and draws creativity from unexplored connections between them. Spacer consists of (i) Nuri, an inspiration engine that builds keyword sets, and (ii) the Manifesting Pipeline that refines these sets into elaborate scientific statements. Nuri extracts novel, high-potential keyword sets from a keyword graph built with 180,000 academic publications in biological fields. The Manifesting Pipeline finds links between keywords, analyzes their logical structure, validates their plausibility, and ultimately drafts original scientific concepts. According to our experiments, the evaluation metric of Nuri accurately classifies high-impact publications with an AUROC score of 0.737. Our Manifesting Pipeline also successfully reconstructs core concepts from the latest top-journal articles solely from their keyword sets. An LLM-based scoring system estimates that this reconstruction was sound for over 85% of the cases. Finally, our embedding space analysis shows that outputs from Spacer are significantly more similar to leading publications compared with those from SOTA LLMs.

Spacer: Op Weg naar Geprogrammeerde Wetenschappelijke Inspiratie

Spacer: Towards Engineered Scientific Inspiration

Samenvatting

Support