SciReasoner: 学際的な科学推論の基盤構築

要旨

我々は、自然言語と多様な科学的表現を整合させる科学的推論基盤モデルを提案する。このモデルは、科学テキスト、純粋なシーケンス、シーケンス-テキストペアを含む206Bトークンのコーパスで事前学習され、40Mの指示によるSFT（Supervised Fine-Tuning）で整合され、長文の連鎖的思考を引き出すためのアニーリングされたコールドスタートブートストラップ、およびタスク固有の報酬形成を用いた強化学習を通じて、意図的な科学的推論を習得する。本モデルは、ワークフロー全体で最大103のタスクをカバーする4つの能力ファミリーをサポートする：(i) テキストと科学フォーマット間の忠実な変換、(ii) テキスト/知識抽出、(iii) 特性予測、(iv) 特性分類、(v) 無条件および条件付きシーケンス生成と設計。専門システムと比較して、我々のアプローチは指示のカバレッジを拡大し、クロスドメインの汎化を改善し、忠実性を向上させる。データキュレーションとトレーニングの詳細を説明し、学際的な学習が転移と下流の信頼性を強化することを示す。本モデル、指示チューニングデータセット、および評価コードは、https://huggingface.co/SciReason と https://github.com/open-sciencelab/SciReason でオープンソース化されている。

English

We present a scientific reasoning foundation model that aligns natural language with heterogeneous scientific representations. The model is pretrained on a 206B-token corpus spanning scientific text, pure sequences, and sequence-text pairs, then aligned via SFT on 40M instructions, annealed cold-start bootstrapping to elicit long-form chain-of-thought, and reinforcement learning with task-specific reward shaping, which instills deliberate scientific reasoning. It supports four capability families, covering up to 103 tasks across workflows: (i) faithful translation between text and scientific formats, (ii) text/knowledge extraction, (iii) property prediction, (iv) property classification, (v) unconditional and conditional sequence generation and design. Compared with specialist systems, our approach broadens instruction coverage, improves cross-domain generalization, and enhances fidelity. We detail data curation and training and show that cross-discipline learning strengthens transfer and downstream reliability. The model, instruct tuning datasets and the evaluation code are open-sourced at https://huggingface.co/SciReason and https://github.com/open-sciencelab/SciReason.

SciReasoner: 学際的な科学推論の基盤構築

SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines

要旨

Support