Curie：朝向具嚴謹性和自動化的人工智慧代理科學實驗前進

摘要

科學實驗是人類進步的基石，要獲得有意義的結果，需要在可靠性、方法控制和可解釋性方面嚴謹。儘管大型語言模型（LLMs）在自動化科學過程的不同方面具有越來越強大的能力，但自動化嚴謹的實驗仍然是一個重大挑戰。為了解決這一問題，我們提出了 Curie，這是一個人工智能代理框架，旨在通過三個關鍵組件將嚴謹性融入實驗過程中：一個內部代理嚴謹性模塊以增強可靠性，一個互相代理嚴謹性模塊以保持方法控制，以及一個實驗知識模塊以增強可解釋性。為了評估 Curie，我們設計了一個新穎的實驗基準，包括來自具有影響力的研究論文和廣泛採用的開源項目的四個計算機科學領域的 46 個問題。與測試的最強基準相比，我們在正確回答實驗問題方面實現了 3.4 倍的改進。Curie 的開源代碼位於 https://github.com/Just-Curieous/Curie。

English

Scientific experimentation, a cornerstone of human progress, demands rigor in reliability, methodical control, and interpretability to yield meaningful results. Despite the growing capabilities of large language models (LLMs) in automating different aspects of the scientific process, automating rigorous experimentation remains a significant challenge. To address this gap, we propose Curie, an AI agent framework designed to embed rigor into the experimentation process through three key components: an intra-agent rigor module to enhance reliability, an inter-agent rigor module to maintain methodical control, and an experiment knowledge module to enhance interpretability. To evaluate Curie, we design a novel experimental benchmark composed of 46 questions across four computer science domains, derived from influential research papers, and widely adopted open-source projects. Compared to the strongest baseline tested, we achieve a 3.4times improvement in correctly answering experimental questions.Curie is open-sourced at https://github.com/Just-Curieous/Curie.

Curie：朝向具嚴謹性和自動化的人工智慧代理科學實驗前進

Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents

摘要

Support