科学推理:解码人工智能创新模式的数据集
Sci-Reasoning: A Dataset Decoding AI Innovation Patterns
January 8, 2026
作者: Jiachen Liu, Maestro Harmon, Zechen Zhang
cs.AI
摘要
在人工智能创新加速发展的同时,突破性成果背后的智力过程——研究者如何识别研究空白、整合前人成果并产生洞见——仍鲜为人知。由于缺乏科学推理的结构化数据,AI研究智能体的系统性分析与开发受到制约。我们推出Sci-Reasoning数据集,首次系统捕捉高质量AI研究背后的智力合成过程。通过社区验证的质量信号与LLM加速、人工校验的流程,我们追溯NeurIPS、ICML和ICLR(2023-2025)的口头报告与焦点论文及其关键前驱研究,以结构化形式阐明具体推理链条。分析揭示15种独特思维模式,其中三种主导策略占比52.7%:空白驱动重构(24.2%)、跨领域融合(18.0%)与表征转换(10.5%)。最具影响力的创新配方融合多种模式:空白驱动重构+表征转换、跨领域融合+表征转换、空白驱动重构+跨领域融合。该数据集支持科学进展的量化研究,并为培养新一代AI研究智能体提供结构化推理轨迹。
English
While AI innovation accelerates rapidly, the intellectual process behind breakthroughs -- how researchers identify gaps, synthesize prior work, and generate insights -- remains poorly understood. The lack of structured data on scientific reasoning hinders systematic analysis and development of AI research agents. We introduce Sci-Reasoning, the first dataset capturing the intellectual synthesis behind high-quality AI research. Using community-validated quality signals and an LLM-accelerated, human-verified pipeline, we trace Oral and Spotlight papers across NeurIPS, ICML, and ICLR (2023-2025) to its key predecessors, articulating specific reasoning links in a structured format. Our analysis identifies 15 distinct thinking patterns, with three dominant strategies accounting for 52.7%: Gap-Driven Reframing (24.2%), Cross-Domain Synthesis (18.0%), and Representation Shift (10.5%). The most powerful innovation recipes combine multiple patterns: Gap-Driven Reframing + Representation Shift, Cross-Domain Synthesis + Representation Shift, and Gap-Driven Reframing + Cross-Domain Synthesis. This dataset enables quantitative studies of scientific progress and provides structured reasoning trajectories for training the next generation AI research agents.