Curie: AIエージェントを用いた厳密かつ自動化された科学実験の実現に向けて Abstract 要約 The scientific method has been the cornerstone of human progress for centuries. However, the increasing complexity of modern scientific experiments poses significant challenges to their reproducibility and scalability. This paper introduces Curie, an AI agent designed to automate and rigorously execute scientific experiments. Curie integrates state-of-the-art machine learning techniques with automated laboratory equipment to conduct experiments with minimal human intervention. We demonstrate Curie's capabilities through a series of case studies spanning multiple scientific domains, showcasing its ability to generate reproducible results, optimize experimental parameters, and discover novel insights. Our results suggest that AI agents like Curie can significantly accelerate scientific discovery while maintaining rigorous experimental standards. 科学的手法は何世紀にもわたり人類の進歩の礎となってきた。しかし、現代の科学実験の複雑さが増すにつれ、その再現性と拡張性に重大な課題が生じている。本論文では、科学実験を自動化し厳密に実行するために設計されたAIエージェント、Curieを紹介する。Curieは最先端の機械学習技術と自動化された実験装置を統合し、最小限の人的介入で実験を実施する。複数の科学分野にまたがる一連のケーススタディを通じて、Curieが再現可能な結果を生成し、実験パラメータを最適化し、新たな知見を発見する能力を示す。我々の結果は、CurieのようなAIエージェントが厳密な実験基準を維持しながら科学的発見を大幅に加速できることを示唆している。

要旨

科学実験は、人類の進歩の礎石であり、信頼性、体系的な制御、解釈可能性に厳密さを求めることで、有意義な結果を生み出します。大規模言語モデル（LLM）が科学プロセスのさまざまな側面を自動化する能力が高まっているにもかかわらず、厳密な実験の自動化は依然として大きな課題です。このギャップを埋めるため、私たちはCurieを提案します。これは、実験プロセスに厳密さを組み込むためのAIエージェントフレームワークで、信頼性を高めるためのエージェント内厳密性モジュール、体系的な制御を維持するためのエージェント間厳密性モジュール、解釈可能性を高めるための実験知識モジュールの3つの主要コンポーネントを備えています。Curieを評価するために、影響力のある研究論文や広く採用されているオープンソースプロジェクトから導出された、コンピュータサイエンスの4つの分野にわたる46の質問からなる新しい実験ベンチマークを設計しました。テストされた最も強力なベースラインと比較して、実験的な質問に正しく答える能力が3.4倍向上しました。Curieはhttps://github.com/Just-Curieous/Curieでオープンソースとして公開されています。

English

Scientific experimentation, a cornerstone of human progress, demands rigor in reliability, methodical control, and interpretability to yield meaningful results. Despite the growing capabilities of large language models (LLMs) in automating different aspects of the scientific process, automating rigorous experimentation remains a significant challenge. To address this gap, we propose Curie, an AI agent framework designed to embed rigor into the experimentation process through three key components: an intra-agent rigor module to enhance reliability, an inter-agent rigor module to maintain methodical control, and an experiment knowledge module to enhance interpretability. To evaluate Curie, we design a novel experimental benchmark composed of 46 questions across four computer science domains, derived from influential research papers, and widely adopted open-source projects. Compared to the strongest baseline tested, we achieve a 3.4times improvement in correctly answering experimental questions.Curie is open-sourced at https://github.com/Just-Curieous/Curie.

Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents

要旨

Support