SciDER:科学数据驱动的端到端研究平台
SciDER: Scientific Data-centric End-to-end Researcher
March 2, 2026
作者: Ke Lin, Yilin Lu, Shreyas Bhat, Xuehang Guo, Junier Oliva, Qingyun Wang
cs.AI
摘要
基于大型语言模型的自动化科学发现正在重塑从构思到实验的研究全周期,但现有智能体仍难以自主处理科学实验收集的原始数据。我们推出以数据为中心的端到端系统SciDER,其独特之处在于通过专业化智能体协同解析分析原始科学数据,基于具体数据特征生成假设与实验设计,并编写执行相应代码。在三个基准测试中的评估表明,SciDER凭借自进化记忆模块与评审引导的反馈循环机制,在专业化数据驱动科学发现任务中表现卓越,其性能超越通用智能体与前沿模型。作为模块化Python套件分发,我们同时提供轻量级Web界面及易用的PyPI软件包,旨在加速自主数据驱动研究进程,让所有科研人员与开发者都能便捷使用。
English
Automated scientific discovery with large language models is transforming the research lifecycle from ideation to experimentation, yet existing agents struggle to autonomously process raw data collected from scientific experiments. We introduce SciDER, a data-centric end-to-end system that automates the research lifecycle. Unlike traditional frameworks, our specialized agents collaboratively parse and analyze raw scientific data, generate hypotheses and experimental designs grounded in specific data characteristics, and write and execute corresponding code. Evaluation on three benchmarks shows SciDER excels in specialized data-driven scientific discovery and outperforms general-purpose agents and state-of-the-art models through its self-evolving memory and critic-led feedback loop. Distributed as a modular Python package, we also provide easy-to-use PyPI packages with a lightweight web interface to accelerate autonomous, data-driven research and aim to be accessible to all researchers and developers.