大規模言語モデルを用いた多様な科学的仮説探索に向けて

要旨

大規模言語モデル（LLMs）は科学的発見を加速するために台頭しており、最近では妥当な科学的仮説を生成するといった高度なタスクにも活用されている。しかし、多くの発見の場面では、検証にノイズが多く費用がかかる可能性があるため、単一の最良仮説を特定することが目的ではない。むしろ科学者は、最良の解に関する下流の不確実性に備えるための高品質な代替仮説の集合から恩恵を受ける。それにもかかわらず、一般的に用いられる進化的探索の手法は、仮説生成において探索よりも最適化を優先する傾向があり、その結果として探索過程における選択圧が多様性の崩壊を引き起こす。これらの限界に動機づけられ、我々は仮説探索をサンプリング問題として定式化する。その目的は、固定されたバリデーション予算の下で、多様で高品質な仮説を効率的に生成することである。この観点に基づき、我々は\oursを提案する。これは古典的な並列焼きなましアルゴリズムに着想を得た進化的フレームワークであり、複数の温度レベルで仮説を探索し、温度間での原理に基づいた情報交換を可能にすることで、収束を妨げずに探索を改善する。分子発見、数式発見、アルゴリズム発見を含む様々な領域において、我々のアプローチは同じバリデーション予算の下で仮説の品質と多様性の両方を一貫して改善し、より高コストな下流の計算検証下でも頑健な候補を生成する。

English

Large language models (LLMs) are on the rise for accelerating scientific discovery, most recently in advanced tasks such as generating valid scientific hypotheses. Yet in many discovery settings, the goal is not to identify a single best hypothesis since validation can be noisy and expensive, and scientists benefit from a set of high-quality alternative hypotheses that hedge against downstream uncertainty for the best solutions. Nevertheless, commonly used evolutionary search recipes tend to prioritize optimization over exploration in hypothesis generation, and the resulting selection pressure during the search process leads to diversity collapse. Motivated by these limitations, we formulate hypothesis search as a sampling problem, where the objective is to efficiently produce diverse, high-quality hypotheses under a fixed validation budget. Building on this perspective, we propose \ours, an evolutionary framework inspired by the classical parallel tempering algorithm that searches hypotheses at multiple temperature levels and enables principled information exchange across temperatures to improve exploration without disrupting convergence. Across domains including molecular discovery, equation discovery, and algorithm discovery, our approach consistently improves both hypothesis quality and diversity under the same validation budget, and produces candidates that remain robust under more expensive downstream computational validations.