Sci-CoE:基于几何共识与稀疏监督协同演化的科学推理大语言模型
Sci-CoE: Co-evolving Scientific Reasoning LLMs via Geometric Consensus with Sparse Supervision
February 12, 2026
作者: Xiaohan He, Shiyang Feng, Songtao Huang, Lei Bai, Bin Wang, Bo Zhang
cs.AI
摘要
大型語言模型(LLMs)已展現出卓越的推理能力,而協同演化範式在代碼與數學等領域也顯示出良好效果。然而在科學推理任務中,由於解決方案評估的不可靠性及驗證策略的多樣性有限,這些模型仍表現脆弱。本研究提出Sci-CoE——一個兩階段科學協同演化框架,通過從稀疏監督到無監督學習的過渡,使模型能同時作為求解器與驗證器實現自我演化。第一階段中,模型利用少量標註數據為驗證器建立基礎的正確性判斷錨點;第二階段引入融合共識度、可靠性與多樣性的幾何獎勵機制,驅動模型在未標註數據上進行大規模自我迭代。在多個通用科學基準測試上的實驗表明,Sci-CoE能有效增強複雜推理能力,並展現出強擴展性,為構建更魯棒且多元的評估體系提供支持。代碼已開源於https://github.com/InternScience/Sci-CoE。
English
Large language models (LLMs) have demonstrated exceptional reasoning capabilities, and co-evolving paradigms have shown promising results in domains such as code and math. However, in scientific reasoning tasks, these models remain fragile due to unreliable solution evaluation and limited diversity in verification strategies. In this work, we propose Sci-CoE, a two-stage scientific co-evolving framework that enables models to self-evolve as both solver and verifier through a transition from sparse supervision to unsupervised learning. In the first stage, the model uses a small set of annotated data to establish fundamental correctness judgment anchors for the Verifier. In the second stage, we introduce a geometric reward mechanism that jointly considers consensus, reliability, and diversity, driving large-scale self-iteration on unlabeled data. Experiments on several general scientific benchmarks demonstrate that Sci-CoE enhances complex reasoning capabilities and exhibits strong scalability, facilitating the construction of more robust and diverse evaluation systems. Codes are available at https://github.com/InternScience/Sci-CoE.