Nemotron-Cascade 2：基於級聯強化學習與多領域同策略蒸餾的訓練後大型語言模型

摘要

我們推出Nemotron-Cascade 2——這是一款擁有300億參數、30億激活參數的開放混合專家模型，具備同類頂級的推理能力與強大的智能體功能。儘管模型體積緊湊，其在數學與程式設計推理方面的表現已接近前沿開放模型水準。作為繼DeepSeekV3.2-Speciale-671B-A37B之後第二款達成此成就的開放權重大語言模型，它同時在2025年國際數學奧林匹克（IMO）、國際資訊奧林匹克（IOI）及ICPC世界總決賽中獲得金獎級表現，以僅需20分之1的參數量展現出卓越的智能密度。相較於Nemotron-Cascade 1，其核心技術突破如下：在經過精細策劃數據集的監督微調後，我們大幅擴展級聯強化學習的覆蓋範圍，涵蓋更廣闊的推理與智能體領域。此外，我們在整個級聯強化學習過程中引入多領域同策略蒸餾技術，從各領域最強的中間教師模型進行知識萃取，從而有效恢復基準測試中的性能回退現象，並持續保持強勁的性能提升。我們同步開源模型檢查點與訓練數據集。

English

We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. Despite its compact size, its mathematical and coding reasoning performance approaches that of frontier open models. It is the second open-weight LLM, after DeepSeekV3.2-Speciale-671B-A37B, to achieve Gold Medal-level performance in the 2025 International Mathematical Olympiad (IMO), the International Olympiad in Informatics (IOI), and the ICPC World Finals, demonstrating remarkably high intelligence density with 20x fewer parameters. In contrast to Nemotron-Cascade 1, the key technical advancements are as follows. After SFT on a meticulously curated dataset, we substantially expand Cascade RL to cover a much broader spectrum of reasoning and agentic domains. Furthermore, we introduce multi-domain on-policy distillation from the strongest intermediate teacher models for each domain throughout the Cascade RL process, allowing us to efficiently recover benchmark regressions and sustain strong performance gains along the way. We release the collection of model checkpoint and training data.

Nemotron-Cascade 2：基於級聯強化學習與多領域同策略蒸餾的訓練後大型語言模型

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

摘要

Support