MOOSE-Chem3：通过模拟实验反馈实现实验引导的假设排序

摘要

假设排序是自动化科学发现中的关键环节，尤其在自然科学研究中，湿实验室实验成本高昂且通量有限。现有方法主要关注实验前的排序，仅依赖大型语言模型的内部推理，而未纳入实验的实证结果。我们提出了实验引导排序这一任务，旨在根据先前测试假设的结果来优先排序候选假设。然而，在自然科学领域，由于重复进行真实实验的不切实际性，开发此类策略面临挑战。为此，我们提出了一个基于三个领域知识假设的模拟器，将假设表现建模为与已知真实假设相似度的函数，并引入噪声扰动。我们整理了一个包含124个化学假设及其实验报告结果的数据集，以验证该模拟器。基于此模拟器，我们开发了一种伪实验引导排序方法，该方法通过共享功能特征对假设进行聚类，并根据模拟实验反馈得出的见解优先排序候选假设。实验表明，我们的方法优于实验前基线及强消融模型。

English

Hypothesis ranking is a crucial component of automated scientific discovery, particularly in natural sciences where wet-lab experiments are costly and throughput-limited. Existing approaches focus on pre-experiment ranking, relying solely on large language model's internal reasoning without incorporating empirical outcomes from experiments. We introduce the task of experiment-guided ranking, which aims to prioritize candidate hypotheses based on the results of previously tested ones. However, developing such strategies is challenging due to the impracticality of repeatedly conducting real experiments in natural science domains. To address this, we propose a simulator grounded in three domain-informed assumptions, modeling hypothesis performance as a function of similarity to a known ground truth hypothesis, perturbed by noise. We curate a dataset of 124 chemistry hypotheses with experimentally reported outcomes to validate the simulator. Building on this simulator, we develop a pseudo experiment-guided ranking method that clusters hypotheses by shared functional characteristics and prioritizes candidates based on insights derived from simulated experimental feedback. Experiments show that our method outperforms pre-experiment baselines and strong ablations.

MOOSE-Chem3：通过模拟实验反馈实现实验引导的假设排序

MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated Experimental Feedback

摘要

Support