ChatPaper.aiChatPaper

BTLM-3B-8K:在3B參數模型中7B參數的表現

BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model

September 20, 2023
作者: Nolan Dey, Daria Soboleva, Faisal Al-Khateeb, Bowen Yang, Ribhu Pathria, Hemant Khachane, Shaheer Muhammad, Zhiming, Chen, Robert Myers, Jacob Robert Steeves, Natalia Vassilieva, Marvin Tom, Joel Hestness
cs.AI

摘要

我們介紹了Bittensor語言模型,名為"BTLM-3B-8K",這是一個新的最先進的30億參數開源語言模型。BTLM-3B-8K是在SlimPajama數據集的627B tokens上訓練的,使用了2,048和8,192上下文長度的混合。BTLM-3B-8K在下游任務中表現優於所有現有的30億參數模型,提高了2-5.5%。BTLM-3B-8K甚至與一些70億參數模型具有競爭力。此外,BTLM-3B-8K提供了出色的長上下文性能,在長達8,192上下文長度的任務中優於MPT-7B-8K和XGen-7B-8K。我們在經過清理和去重的SlimPajama數據集上訓練了模型;積極調整了μP超參數和時間表;使用了ALiBi位置嵌入;並採用了SwiGLU非線性。在Hugging Face上,最受歡迎的模型具有70億參數,這表明用戶更喜歡70億模型的質量-大小比。將70億參數模型壓縮為30億參數模型,並幾乎不影響性能,是一個重要的里程碑。BTLM-3B-8K僅需要3GB內存,精度為4位,並且比70億模型的推理計算少2.5倍,有助於在移動和邊緣設備上開放強大的語言模型。BTLM-3B-8K在Hugging Face上以Apache 2.0許可證提供:https://huggingface.co/cerebras/btlm-3b-8k-base。
English
We introduce the Bittensor Language Model, called "BTLM-3B-8K", a new state-of-the-art 3 billion parameter open-source language model. BTLM-3B-8K was trained on 627B tokens from the SlimPajama dataset with a mixture of 2,048 and 8,192 context lengths. BTLM-3B-8K outperforms all existing 3B parameter models by 2-5.5% across downstream tasks. BTLM-3B-8K is even competitive with some 7B parameter models. Additionally, BTLM-3B-8K provides excellent long context performance, outperforming MPT-7B-8K and XGen-7B-8K on tasks up to 8,192 context length. We trained the model on a cleaned and deduplicated SlimPajama dataset; aggressively tuned the \textmu P hyperparameters and schedule; used ALiBi position embeddings; and adopted the SwiGLU nonlinearity. On Hugging Face, the most popular models have 7B parameters, indicating that users prefer the quality-size ratio of 7B models. Compacting the 7B parameter model to one with 3B parameters, with little performance impact, is an important milestone. BTLM-3B-8K needs only 3GB of memory with 4-bit precision and takes 2.5x less inference compute than 7B models, helping to open up access to a powerful language model on mobile and edge devices. BTLM-3B-8K is available under an Apache 2.0 license on Hugging Face: https://huggingface.co/cerebras/btlm-3b-8k-base.
PDF102December 15, 2024