MOOSE-Chem3：透過模擬實驗反饋實現實驗引導的假設排序

摘要

假設排序是自動化科學發現中的關鍵組成部分，特別是在自然科學領域，因為濕實驗成本高昂且通量有限。現有方法主要關注實驗前排序，僅依賴大型語言模型的內部推理，而未納入實驗的實證結果。我們引入了實驗引導排序任務，旨在根據先前測試過的假設結果來優先考慮候選假設。然而，由於在自然科學領域中反覆進行真實實驗的不切實際性，開發此類策略具有挑戰性。為解決這一問題，我們提出了一個基於三個領域知識假設的模擬器，將假設表現建模為與已知真實假設相似度的函數，並受到噪聲的干擾。我們整理了一個包含124個化學假設及其實驗報告結果的數據集，以驗證該模擬器。在此模擬器基礎上，我們開發了一種偽實驗引導排序方法，該方法通過共享功能特徵對假設進行聚類，並根據模擬實驗反饋得出的見解來優先考慮候選假設。實驗表明，我們的方法優於實驗前基線和強力的消融實驗。

English

Hypothesis ranking is a crucial component of automated scientific discovery, particularly in natural sciences where wet-lab experiments are costly and throughput-limited. Existing approaches focus on pre-experiment ranking, relying solely on large language model's internal reasoning without incorporating empirical outcomes from experiments. We introduce the task of experiment-guided ranking, which aims to prioritize candidate hypotheses based on the results of previously tested ones. However, developing such strategies is challenging due to the impracticality of repeatedly conducting real experiments in natural science domains. To address this, we propose a simulator grounded in three domain-informed assumptions, modeling hypothesis performance as a function of similarity to a known ground truth hypothesis, perturbed by noise. We curate a dataset of 124 chemistry hypotheses with experimentally reported outcomes to validate the simulator. Building on this simulator, we develop a pseudo experiment-guided ranking method that clusters hypotheses by shared functional characteristics and prioritizes candidates based on insights derived from simulated experimental feedback. Experiments show that our method outperforms pre-experiment baselines and strong ablations.

MOOSE-Chem3：透過模擬實驗反饋實現實驗引導的假設排序

MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated Experimental Feedback

摘要

Support