ChatPaper.aiChatPaper

科学推理:解码人工智能创新模式的数据集

Sci-Reasoning: A Dataset Decoding AI Innovation Patterns

January 8, 2026
作者: Jiachen Liu, Maestro Harmon, Zechen Zhang
cs.AI

摘要

在人工智能创新加速发展的当下,突破性成果背后的智力过程——研究者如何识别研究空白、整合前人工作并产生洞见——仍鲜为人知。科学推理结构化数据的缺失,阻碍了对AI研究智能体的系统性分析与开发。我们推出首个捕捉高质量AI研究背后智力合成过程的Sci-Reasoning数据集:通过社区验证的质量信号与LLM加速、人工校验的流程,追溯NeurIPS、ICML和ICLR(2023-2025)口头报告与焦点论文的关键前驱研究,以结构化形式阐明具体推理链条。分析揭示了15种独特思维模式,其中三种主导策略占比52.7%:空白驱动重构(24.2%)、跨领域融合(18.0%)与表征转换(10.5%)。最具创新性的方法往往融合多种模式:空白驱动重构+表征转换、跨领域融合+表征转换、空白驱动重构+跨领域融合。该数据集支持科学进步的量化研究,并为培养新一代AI研究智能体提供了结构化推理轨迹。
English
While AI innovation accelerates rapidly, the intellectual process behind breakthroughs -- how researchers identify gaps, synthesize prior work, and generate insights -- remains poorly understood. The lack of structured data on scientific reasoning hinders systematic analysis and development of AI research agents. We introduce Sci-Reasoning, the first dataset capturing the intellectual synthesis behind high-quality AI research. Using community-validated quality signals and an LLM-accelerated, human-verified pipeline, we trace Oral and Spotlight papers across NeurIPS, ICML, and ICLR (2023-2025) to its key predecessors, articulating specific reasoning links in a structured format. Our analysis identifies 15 distinct thinking patterns, with three dominant strategies accounting for 52.7%: Gap-Driven Reframing (24.2%), Cross-Domain Synthesis (18.0%), and Representation Shift (10.5%). The most powerful innovation recipes combine multiple patterns: Gap-Driven Reframing + Representation Shift, Cross-Domain Synthesis + Representation Shift, and Gap-Driven Reframing + Cross-Domain Synthesis. This dataset enables quantitative studies of scientific progress and provides structured reasoning trajectories for training the next generation AI research agents.
PDF42January 31, 2026