ラマ・ネモトロン：効率的な推論モデル

要旨

Llama-Nemotronシリーズのモデルを紹介する。これは、優れた推論能力、推論効率、および企業利用のためのオープンライセンスを提供する、異種推論モデルのオープンファミリーである。このファミリーは、Nano（8B）、Super（49B）、Ultra（253B）の3つのサイズで提供され、DeepSeek-R1などの最先端の推論モデルと競争力のある性能を発揮しながら、優れた推論スループットとメモリ効率を提供する。本報告では、これらのモデルのトレーニング手順について議論する。これには、Llama 3モデルからのニューラルアーキテクチャサーチを用いた高速化推論、知識蒸留、および継続的な事前学習が含まれ、その後、推論に焦点を当てたポストトレーニング段階が続く。ポストトレーニング段階は、教師ありファインチューニングと大規模な強化学習の2つの主要部分から構成される。Llama-Nemotronモデルは、動的な推論切り替えをサポートする最初のオープンソースモデルであり、ユーザーは推論中に標準のチャットモードと推論モードを切り替えることができる。オープンリサーチをさらに支援し、モデル開発を促進するために、以下のリソースを提供する：1. Llama-Nemotron推論モデル（LN-Nano、LN-Super、LN-Ultra）を、商業的に許容可能なNVIDIAオープンモデルライセンス契約の下でリリースする。2. 完全なポストトレーニングデータセット（Llama-Nemotron-Post-Training-Dataset）をリリースする。3. トレーニングコードベース（NeMo、NeMo-Aligner、Megatron-LM）もリリースする。

English

We introduce the Llama-Nemotron series of models, an open family of heterogeneous reasoning models that deliver exceptional reasoning capabilities, inference efficiency, and an open license for enterprise use. The family comes in three sizes -- Nano (8B), Super (49B), and Ultra (253B) -- and performs competitively with state-of-the-art reasoning models such as DeepSeek-R1 while offering superior inference throughput and memory efficiency. In this report, we discuss the training procedure for these models, which entails using neural architecture search from Llama 3 models for accelerated inference, knowledge distillation, and continued pretraining, followed by a reasoning-focused post-training stage consisting of two main parts: supervised fine-tuning and large scale reinforcement learning. Llama-Nemotron models are the first open-source models to support a dynamic reasoning toggle, allowing users to switch between standard chat and reasoning modes during inference. To further support open research and facilitate model development, we provide the following resources: 1. We release the Llama-Nemotron reasoning models -- LN-Nano, LN-Super, and LN-Ultra -- under the commercially permissive NVIDIA Open Model License Agreement. 2. We release the complete post-training dataset: Llama-Nemotron-Post-Training-Dataset. 3. We also release our training codebases: NeMo, NeMo-Aligner, and Megatron-LM.