ChatPaper.aiChatPaper

Nemotron-4 15B 技術報告

Nemotron-4 15B Technical Report

February 26, 2024
作者: Jupinder Parmar, Shrimai Prabhumoye, Joseph Jennings, Mostofa Patwary, Sandeep Subramanian, Dan Su, Chen Zhu, Deepak Narayanan, Aastha Jhunjhunwala, Ayush Dattagupta, Vibhu Jawa, Jiwei Liu, Ameya Mahabaleshwarkar, Osvald Nitski, Annika Brundyn, James Maki, Miguel Martinez, Jiaxuan You, John Kamalu, Patrick LeGresley, Denys Fridman, Jared Casper, Ashwath Aithal, Oleksii Kuchaiev, Mohammad Shoeybi, Jonathan Cohen, Bryan Catanzaro
cs.AI

摘要

我們介紹了 Nemotron-4 15B,一個擁有 150 億參數的大型多語言語言模型,訓練過程中使用了 8000 億個文本標記。Nemotron-4 15B 在英語、多語言和編碼任務中表現出色:在 7 個下游評估領域中,它在其中 4 個領域中表現優於所有現有的同等大小的開放模型,並在其餘領域中達到與領先的開放模型相競爭的表現。具體來說,Nemotron-4 15B 展現出所有同等大小模型中最佳的多語言能力,甚至優於四倍以上的模型以及專門為多語言任務而設計的模型。
English
We introduce Nemotron-4 15B, a 15-billion-parameter large multilingual language model trained on 8 trillion text tokens. Nemotron-4 15B demonstrates strong performance when assessed on English, multilingual, and coding tasks: it outperforms all existing similarly-sized open models on 4 out of 7 downstream evaluation areas and achieves competitive performance to the leading open models in the remaining ones. Specifically, Nemotron-4 15B exhibits the best multilingual capabilities of all similarly-sized models, even outperforming models over four times larger and those explicitly specialized for multilingual tasks.
PDF474December 15, 2024