BTLM-3B-8K:7B参数在3B参数模型中的性能
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
September 20, 2023
作者: Nolan Dey, Daria Soboleva, Faisal Al-Khateeb, Bowen Yang, Ribhu Pathria, Hemant Khachane, Shaheer Muhammad, Zhiming, Chen, Robert Myers, Jacob Robert Steeves, Natalia Vassilieva, Marvin Tom, Joel Hestness
cs.AI
摘要
我们介绍了Bittensor语言模型,称为“BTLM-3B-8K”,这是一个新的最先进的30亿参数开源语言模型。BTLM-3B-8K在SlimPajama数据集的627B标记上进行了训练,使用了2,048和8,192上下文长度的混合。BTLM-3B-8K在下游任务中胜过所有现有的30亿参数模型,性能提高了2-5.5%。BTLM-3B-8K甚至与一些70亿参数模型具有竞争力。此外,BTLM-3B-8K在长上下文性能方面表现出色,在高达8,192上下文长度的任务中胜过了MPT-7B-8K和XGen-7B-8K。我们在经过清理和去重的SlimPajama数据集上训练了模型;积极调整了μP超参数和调度;使用了ALiBi位置嵌入;并采用了SwiGLU非线性。在Hugging Face上,最受欢迎的模型具有70亿参数,表明用户更喜欢70亿模型的质量-大小比。将70亿参数模型压缩为30亿参数模型,并且性能影响较小,是一个重要的里程碑。BTLM-3B-8K仅需3GB内存,精度为4位,推断计算量比70亿模型少2.5倍,有助于在移动和边缘设备上访问功能强大的语言模型。BTLM-3B-8K在Hugging Face上以Apache 2.0许可证发布:https://huggingface.co/cerebras/btlm-3b-8k-base。
English
We introduce the Bittensor Language Model, called "BTLM-3B-8K", a new
state-of-the-art 3 billion parameter open-source language model. BTLM-3B-8K was
trained on 627B tokens from the SlimPajama dataset with a mixture of 2,048 and
8,192 context lengths. BTLM-3B-8K outperforms all existing 3B parameter models
by 2-5.5% across downstream tasks. BTLM-3B-8K is even competitive with some 7B
parameter models. Additionally, BTLM-3B-8K provides excellent long context
performance, outperforming MPT-7B-8K and XGen-7B-8K on tasks up to 8,192
context length. We trained the model on a cleaned and deduplicated SlimPajama
dataset; aggressively tuned the \textmu P hyperparameters and schedule; used
ALiBi position embeddings; and adopted the SwiGLU nonlinearity.
On Hugging Face, the most popular models have 7B parameters, indicating that
users prefer the quality-size ratio of 7B models. Compacting the 7B parameter
model to one with 3B parameters, with little performance impact, is an
important milestone. BTLM-3B-8K needs only 3GB of memory with 4-bit precision
and takes 2.5x less inference compute than 7B models, helping to open up access
to a powerful language model on mobile and edge devices. BTLM-3B-8K is
available under an Apache 2.0 license on Hugging Face:
https://huggingface.co/cerebras/btlm-3b-8k-base.