ChatPaper.aiChatPaper

Nemotron-4 15B 技术报告

Nemotron-4 15B Technical Report

February 26, 2024
作者: Jupinder Parmar, Shrimai Prabhumoye, Joseph Jennings, Mostofa Patwary, Sandeep Subramanian, Dan Su, Chen Zhu, Deepak Narayanan, Aastha Jhunjhunwala, Ayush Dattagupta, Vibhu Jawa, Jiwei Liu, Ameya Mahabaleshwarkar, Osvald Nitski, Annika Brundyn, James Maki, Miguel Martinez, Jiaxuan You, John Kamalu, Patrick LeGresley, Denys Fridman, Jared Casper, Ashwath Aithal, Oleksii Kuchaiev, Mohammad Shoeybi, Jonathan Cohen, Bryan Catanzaro
cs.AI

摘要

我们介绍了 Nemotron-4 15B,这是一个拥有 150 亿参数的大型多语言语言模型,训练数据包含 8 万亿文本标记。Nemotron-4 15B 在英语、多语言和编码任务上表现出色:在 7 个下游评估领域中,它在 4 个领域中胜过所有现有规模相似的开放模型,并在其余领域中取得了与领先的开放模型竞争力相当的表现。具体而言,Nemotron-4 15B 展现出了所有规模相似模型中最佳的多语言能力,甚至胜过四倍以上规模的模型以及专门针对多语言任务的模型。
English
We introduce Nemotron-4 15B, a 15-billion-parameter large multilingual language model trained on 8 trillion text tokens. Nemotron-4 15B demonstrates strong performance when assessed on English, multilingual, and coding tasks: it outperforms all existing similarly-sized open models on 4 out of 7 downstream evaluation areas and achieves competitive performance to the leading open models in the remaining ones. Specifically, Nemotron-4 15B exhibits the best multilingual capabilities of all similarly-sized models, even outperforming models over four times larger and those explicitly specialized for multilingual tasks.
PDF474December 15, 2024