TinyLlama: 오픈소스 소형 언어 모델

초록

우리는 약 1조 개의 토큰으로 대략 3 에포크 동안 사전 학습된 1.1B 규모의 컴팩트 언어 모델인 TinyLlama를 소개합니다. TinyLlama는 Llama 2의 아키텍처와 토크나이저를 기반으로 하며, 오픈소스 커뮤니티의 다양한 발전(예: FlashAttention)을 활용하여 더 나은 계산 효율성을 달성했습니다. 상대적으로 작은 크기에도 불구하고, TinyLlama는 일련의 다운스트림 작업에서 뛰어난 성능을 보여줍니다. 이는 비슷한 크기의 기존 오픈소스 언어 모델들을 크게 능가합니다. 우리의 모델 체크포인트와 코드는 https://github.com/jzhang38/TinyLlama에서 공개적으로 제공됩니다.

English

We present TinyLlama, a compact 1.1B language model pretrained on around 1 trillion tokens for approximately 3 epochs. Building on the architecture and tokenizer of Llama 2, TinyLlama leverages various advances contributed by the open-source community (e.g., FlashAttention), achieving better computational efficiency. Despite its relatively small size, TinyLlama demonstrates remarkable performance in a series of downstream tasks. It significantly outperforms existing open-source language models with comparable sizes. Our model checkpoints and code are publicly available on GitHub at https://github.com/jzhang38/TinyLlama.

TinyLlama: 오픈소스 소형 언어 모델

TinyLlama: An Open-Source Small Language Model

초록

Support