MOOSE-Chem3:通过模拟实验反馈实现实验引导的假设排序
MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated Experimental Feedback
May 23, 2025
作者: Wanhao Liu, Zonglin Yang, Jue Wang, Lidong Bing, Di Zhang, Dongzhan Zhou, Yuqiang Li, Houqiang Li, Erik Cambria, Wanli Ouyang
cs.AI
摘要
假设排序是自动化科学发现中的关键环节,尤其在自然科学研究中,湿实验室实验成本高昂且通量有限。现有方法主要关注实验前的排序,仅依赖大型语言模型的内部推理,而未纳入实验的实证结果。我们提出了实验引导排序这一任务,旨在根据先前测试假设的结果来优先排序候选假设。然而,在自然科学领域,由于重复进行真实实验的不切实际性,开发此类策略面临挑战。为此,我们提出了一个基于三个领域知识假设的模拟器,将假设表现建模为与已知真实假设相似度的函数,并引入噪声扰动。我们整理了一个包含124个化学假设及其实验报告结果的数据集,以验证该模拟器。基于此模拟器,我们开发了一种伪实验引导排序方法,该方法通过共享功能特征对假设进行聚类,并根据模拟实验反馈得出的见解优先排序候选假设。实验表明,我们的方法优于实验前基线及强消融模型。
English
Hypothesis ranking is a crucial component of automated scientific discovery,
particularly in natural sciences where wet-lab experiments are costly and
throughput-limited. Existing approaches focus on pre-experiment ranking,
relying solely on large language model's internal reasoning without
incorporating empirical outcomes from experiments. We introduce the task of
experiment-guided ranking, which aims to prioritize candidate hypotheses based
on the results of previously tested ones. However, developing such strategies
is challenging due to the impracticality of repeatedly conducting real
experiments in natural science domains. To address this, we propose a simulator
grounded in three domain-informed assumptions, modeling hypothesis performance
as a function of similarity to a known ground truth hypothesis, perturbed by
noise. We curate a dataset of 124 chemistry hypotheses with experimentally
reported outcomes to validate the simulator. Building on this simulator, we
develop a pseudo experiment-guided ranking method that clusters hypotheses by
shared functional characteristics and prioritizes candidates based on insights
derived from simulated experimental feedback. Experiments show that our method
outperforms pre-experiment baselines and strong ablations.Summary
AI-Generated Summary