BTLM-3B-8K: 3B 파라미터 모델에서의 7B 파라미터 성능

초록

우리는 새로운 최첨단 30억 파라미터 오픈소스 언어 모델인 "BTLM-3B-8K"를 소개합니다. BTLM-3B-8K는 SlimPajama 데이터셋의 6270억 토큰을 사용하여 2,048과 8,192의 컨텍스트 길이를 혼합하여 학습되었습니다. BTLM-3B-8K는 기존의 모든 30억 파라미터 모델을 다운스트림 작업에서 2-5.5% 앞섭니다. 또한, BTLM-3B-8K는 일부 70억 파라미터 모델과도 경쟁력을 갖추고 있습니다. 더불어, BTLM-3B-8K는 긴 컨텍스트 성능에서도 우수하여, MPT-7B-8K와 XGen-7B-8K를 8,192 컨텍스트 길이 작업에서 능가합니다. 우리는 모델을 정제되고 중복이 제거된 SlimPajama 데이터셋으로 학습시켰으며, \textmu P 하이퍼파라미터와 스케줄을 적극적으로 튜닝하고, ALiBi 위치 임베딩을 사용했으며, SwiGLU 비선형성을 채택했습니다. Hugging Face에서 가장 인기 있는 모델들은 70억 파라미터를 가지고 있으며, 이는 사용자들이 70억 파라미터 모델의 품질-크기 비율을 선호한다는 것을 나타냅니다. 70억 파라미터 모델을 성능에 거의 영향을 미치지 않으면서 30억 파라미터로 압축하는 것은 중요한 이정표입니다. BTLM-3B-8K는 4비트 정밀도에서 단 3GB의 메모리만 필요하며, 70억 파라미터 모델보다 2.5배 적은 추론 계산을 요구하여, 모바일 및 엣지 디바이스에서 강력한 언어 모델에 대한 접근성을 높이는 데 기여합니다. BTLM-3B-8K는 Hugging Face에서 Apache 2.0 라이선스로 제공됩니다: https://huggingface.co/cerebras/btlm-3b-8k-base.

English

We introduce the Bittensor Language Model, called "BTLM-3B-8K", a new state-of-the-art 3 billion parameter open-source language model. BTLM-3B-8K was trained on 627B tokens from the SlimPajama dataset with a mixture of 2,048 and 8,192 context lengths. BTLM-3B-8K outperforms all existing 3B parameter models by 2-5.5% across downstream tasks. BTLM-3B-8K is even competitive with some 7B parameter models. Additionally, BTLM-3B-8K provides excellent long context performance, outperforming MPT-7B-8K and XGen-7B-8K on tasks up to 8,192 context length. We trained the model on a cleaned and deduplicated SlimPajama dataset; aggressively tuned the \textmu P hyperparameters and schedule; used ALiBi position embeddings; and adopted the SwiGLU nonlinearity. On Hugging Face, the most popular models have 7B parameters, indicating that users prefer the quality-size ratio of 7B models. Compacting the 7B parameter model to one with 3B parameters, with little performance impact, is an important milestone. BTLM-3B-8K needs only 3GB of memory with 4-bit precision and takes 2.5x less inference compute than 7B models, helping to open up access to a powerful language model on mobile and edge devices. BTLM-3B-8K is available under an Apache 2.0 license on Hugging Face: https://huggingface.co/cerebras/btlm-3b-8k-base.

BTLM-3B-8K: 3B 파라미터 모델에서의 7B 파라미터 성능

BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model

초록

Support