Llama-Nemotron：高效推理模型

摘要

我们推出Llama-Nemotron系列模型，这是一个开放的异构推理模型家族，具备卓越的推理能力、高效的推理性能，并为企业使用提供了开放许可。该系列包含三种规模——Nano（8B）、Super（49B）和Ultra（253B）——在推理吞吐量和内存效率上表现优异，与DeepSeek-R1等顶尖推理模型竞争。本报告中，我们详细阐述了这些模型的训练流程，包括利用Llama 3模型进行神经架构搜索以加速推理、知识蒸馏及持续预训练，随后进入以推理为重点的后训练阶段，该阶段由监督微调和大规模强化学习两部分构成。Llama-Nemotron模型是首个支持动态推理切换的开源模型，允许用户在推理过程中在标准聊天模式与推理模式之间自由切换。为进一步支持开放研究并促进模型开发，我们提供以下资源：1. 我们依据商业友好的NVIDIA开放模型许可协议，发布了Llama-Nemotron推理模型——LN-Nano、LN-Super和LN-Ultra。2. 我们公开了完整的后训练数据集：Llama-Nemotron-Post-Training-Dataset。3. 我们还发布了我们的训练代码库：NeMo、NeMo-Aligner和Megatron-LM。

English

We introduce the Llama-Nemotron series of models, an open family of heterogeneous reasoning models that deliver exceptional reasoning capabilities, inference efficiency, and an open license for enterprise use. The family comes in three sizes -- Nano (8B), Super (49B), and Ultra (253B) -- and performs competitively with state-of-the-art reasoning models such as DeepSeek-R1 while offering superior inference throughput and memory efficiency. In this report, we discuss the training procedure for these models, which entails using neural architecture search from Llama 3 models for accelerated inference, knowledge distillation, and continued pretraining, followed by a reasoning-focused post-training stage consisting of two main parts: supervised fine-tuning and large scale reinforcement learning. Llama-Nemotron models are the first open-source models to support a dynamic reasoning toggle, allowing users to switch between standard chat and reasoning modes during inference. To further support open research and facilitate model development, we provide the following resources: 1. We release the Llama-Nemotron reasoning models -- LN-Nano, LN-Super, and LN-Ultra -- under the commercially permissive NVIDIA Open Model License Agreement. 2. We release the complete post-training dataset: Llama-Nemotron-Post-Training-Dataset. 3. We also release our training codebases: NeMo, NeMo-Aligner, and Megatron-LM.