Nemotron-Cascade 2：基于级联强化学习与多域同策略蒸馏的后训练大语言模型

摘要

我们推出Nemotron-Cascade 2——一个拥有300亿参数、30亿激活参数的开放混合专家模型，具备顶尖的推理能力与强大的智能体性能。尽管模型体积紧凑，其在数学与代码推理方面的表现已接近前沿开放模型水平。这是继DeepSeekV3.2-Speciale-671B-A37B之后，第二个在2025年国际数学奥林匹克（IMO）、国际信息学奥林匹克（IOI）和ICPC全球总决赛中达到金牌级性能的开放权重大语言模型，以仅二十分之一的参数量实现了卓越的智能密度。相较于Nemotron-Cascade 1，本代模型的核心技术突破如下：在精心构建的数据集上进行监督微调后，我们大幅扩展了级联强化学习的覆盖范围，使其涵盖更广泛的推理与智能体领域。此外，我们在级联强化学习全流程中引入多领域策略内蒸馏技术，从各领域最强的中间教师模型进行知识迁移，从而有效恢复基准测试中的性能回退，并持续保持强劲的性能提升。我们同步发布了模型检查点与训练数据集合。

English

We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. Despite its compact size, its mathematical and coding reasoning performance approaches that of frontier open models. It is the second open-weight LLM, after DeepSeekV3.2-Speciale-671B-A37B, to achieve Gold Medal-level performance in the 2025 International Mathematical Olympiad (IMO), the International Olympiad in Informatics (IOI), and the ICPC World Finals, demonstrating remarkably high intelligence density with 20x fewer parameters. In contrast to Nemotron-Cascade 1, the key technical advancements are as follows. After SFT on a meticulously curated dataset, we substantially expand Cascade RL to cover a much broader spectrum of reasoning and agentic domains. Furthermore, we introduce multi-domain on-policy distillation from the strongest intermediate teacher models for each domain throughout the Cascade RL process, allowing us to efficiently recover benchmark regressions and sustain strong performance gains along the way. We release the collection of model checkpoint and training data.

Nemotron-Cascade 2：基于级联强化学习与多域同策略蒸馏的后训练大语言模型

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

摘要

Support