Llama-Nemotron：高效推理模型

摘要

我們推出Llama-Nemotron系列模型，這是一組開放的異質推理模型家族，具備卓越的推理能力、高效的推論效率，並提供企業使用的開放授權。該系列包含三種規模——Nano（8B）、Super（49B）和Ultra（253B）——在與DeepSeek-R1等尖端推理模型的競爭中表現出色，同時提供更優的推論吞吐量和記憶體效率。在本報告中，我們討論了這些模型的訓練流程，其中包括利用Llama 3模型進行神經架構搜索以加速推論、知識蒸餾及持續預訓練，隨後是專注於推理的後訓練階段，該階段由兩大部分組成：監督式微調和大規模強化學習。Llama-Nemotron模型是首個支持動態推理切換的開源模型，允許用戶在推論過程中於標準聊天模式和推理模式之間切換。為了進一步支持開放研究並促進模型開發，我們提供以下資源：1. 我們在商業許可的NVIDIA開放模型授權協議下發布了Llama-Nemotron推理模型——LN-Nano、LN-Super和LN-Ultra。2. 我們發布了完整的後訓練數據集：Llama-Nemotron-Post-Training-Dataset。3. 我們還發布了我們的訓練代碼庫：NeMo、NeMo-Aligner和Megatron-LM。

English

We introduce the Llama-Nemotron series of models, an open family of heterogeneous reasoning models that deliver exceptional reasoning capabilities, inference efficiency, and an open license for enterprise use. The family comes in three sizes -- Nano (8B), Super (49B), and Ultra (253B) -- and performs competitively with state-of-the-art reasoning models such as DeepSeek-R1 while offering superior inference throughput and memory efficiency. In this report, we discuss the training procedure for these models, which entails using neural architecture search from Llama 3 models for accelerated inference, knowledge distillation, and continued pretraining, followed by a reasoning-focused post-training stage consisting of two main parts: supervised fine-tuning and large scale reinforcement learning. Llama-Nemotron models are the first open-source models to support a dynamic reasoning toggle, allowing users to switch between standard chat and reasoning modes during inference. To further support open research and facilitate model development, we provide the following resources: 1. We release the Llama-Nemotron reasoning models -- LN-Nano, LN-Super, and LN-Ultra -- under the commercially permissive NVIDIA Open Model License Agreement. 2. We release the complete post-training dataset: Llama-Nemotron-Post-Training-Dataset. 3. We also release our training codebases: NeMo, NeMo-Aligner, and Megatron-LM.

Llama-Nemotron：高效推理模型

Llama-Nemotron: Efficient Reasoning Models

摘要

Support