MOOSE-Star：突破复杂度壁垒，开启科学发现的易处理训练新纪元

摘要

尽管大语言模型在科学发现中展现出潜力，但现有研究主要聚焦于推理或反馈驱动的训练，而对生成式推理过程P(假设|背景知识)（P(h|b)）的直接建模仍属空白。我们证明，由于从海量知识库中检索并组合灵感存在组合爆炸复杂度（O(N^k)），直接训练P(h|b)在数学上是不可行的。为突破此障碍，我们提出MOOSE-Star统一框架，实现可高效训练的推理扩展。该框架通过三重机制将最优情况下的复杂度从指数级降至对数级（O(log N)）：（1）基于发现概率方程分解子任务进行训练；（2）采用动机引导的层次化搜索实现对数级检索并剪枝无关子空间；（3）利用有界组合操作提升对检索噪声的鲁棒性。为此我们发布TOMATO-Star数据集——包含108,717篇经分解的论文（消耗38,400 GPU小时）用于训练。进一步实验表明，当暴力采样遭遇"复杂度墙"时，MOOSE-Star仍能保持持续增长的测试时扩展性。

English

While large language models (LLMs) show promise in scientific discovery, existing research focuses on inference or feedback-driven training, leaving the direct modeling of the generative reasoning process, P(hypothesis|background) (P(h|b)), unexplored. We demonstrate that directly training P(h|b) is mathematically intractable due to the combinatorial complexity (O(N^k)) inherent in retrieving and composing inspirations from a vast knowledge base. To break this barrier, we introduce MOOSE-Star, a unified framework enabling tractable training and scalable inference. In the best case, MOOSE-Star reduces complexity from exponential to logarithmic (O(log N)) by (1) training on decomposed subtasks derived from the probabilistic equation of discovery, (2) employing motivation-guided hierarchical search to enable logarithmic retrieval and prune irrelevant subspaces, and (3) utilizing bounded composition for robustness against retrieval noise. To facilitate this, we release TOMATO-Star, a dataset of 108,717 decomposed papers (38,400 GPU hours) for training. Furthermore, we show that while brute-force sampling hits a ''complexity wall,'' MOOSE-Star exhibits continuous test-time scaling.