基於結構化組件獎勵機制釋放科學推理以生成生物實驗方案
Unleashing Scientific Reasoning for Bio-experimental Protocol Generation via Structured Component-based Reward Mechanism
October 17, 2025
作者: Haoran Sun, Yankai Jiang, Zhenyu Tang, Yaning Pan, Shuang Gu, Zekai Lin, Lilong Wang, Wenjie Lou, Lei Liu, Lei Bai, Xiaosong Wang
cs.AI
摘要
可重現科學的基石在於精確、邏輯有序且可執行的實驗流程。通過自然語言查詢自主生成這些流程,能大幅提升重現過程的效率。然而,當前領先的大型語言模型(LLMs)常生成不完整或不一致的流程,限制了其實用性。為解決這一局限,我們首先引入了SciRecipe,這是一個包含超過12,000條結構化流程的大規模數據集,涵蓋27個生物學子領域,並包含理解與問題解決任務。為進一步提升流程生成質量,我們提出了“草圖與填充”範式,該範式將分析、結構化與表達分離,確保每一步驟都明確且可驗證。與此相輔相成,基於結構化組件的獎勵機制評估步驟粒度、動作順序及語義保真度,使模型優化與實驗可靠性保持一致。基於這些組件,我們開發了Thoth,其通過分階段的“知識到行動”過程進行訓練,從知識獲取逐步過渡到操作推理,最終生成穩健且可執行的流程。在多個基準測試中,Thoth持續超越專有及開源LLMs,在步驟對齊、邏輯序列及語義準確性上實現顯著提升。我們的方法為連接知識與實驗執行的可靠科學助手鋪平了道路。所有數據、代碼及模型將公開釋出。
English
The foundation of reproducible science lies in protocols that are precise,
logically ordered, and executable. The autonomous generation of these protocols
through natural language queries could greatly improve the efficiency of the
reproduction process. However, current leading large language models (LLMs)
often generate incomplete or inconsistent protocols, limiting their utility. To
address this limitation, we first introduce SciRecipe, a large-scale dataset of
over 12K structured protocols spanning 27 biological subfields and encompassing
both comprehension and problem-solving tasks. To further improve protocol
generation, we propose the "Sketch-and-Fill" paradigm, which separates
analysis, structuring, and expression to ensure each step is explicit and
verifiable. Complementing this, the structured component-based reward mechanism
evaluates step granularity, action order, and semantic fidelity, aligning model
optimization with experimental reliability. Building on these components, we
develop Thoth, trained through a staged Knowledge-to-Action process that
progresses from knowledge acquisition to operational reasoning and ultimately
to robust, executable protocol generation. Across multiple benchmarks, Thoth
consistently surpasses both proprietary and open-source LLMs, achieving
significant improvements in step alignment, logical sequencing, and semantic
accuracy. Our approach paves the way for reliable scientific assistants that
bridge knowledge with experimental execution. All data, code, and models will
be released publicly.