GEB-1.3B: 오픈 소스 경량 대형 언어 모델

초록

최근 ChatGPT, Claude, Llama와 같은 대형 언어 모델(LLM)이 인상적인 능력을 보여주며 여러 작업에서 인간 수준의 성능을 뛰어넘기도 했습니다. 그러나 이러한 모델들의 성공에도 불구하고, 학습과 추론 모두에서 상당한 컴퓨팅 파워를 요구하는 자원 집약적인 특성으로 인해 고성능 서버에만 배포가 제한되고 있습니다. 또한, 모델의 광범위한 계산 요구 사항은 응답 시간의 지연을 초래하는 경우가 많습니다. CPU에서 효율적으로 작동할 수 있는 LLM에 대한 필요성이 증가함에 따라, CPU 추론에 최적화된 경량 모델에 대한 연구가 등장했습니다. 본 연구에서는 중국어와 영어로 구성된 5500억 토큰으로 학습된 경량 LLM인 GEB-1.3B를 소개합니다. 우리는 ROPE, Group-Query-Attention, FlashAttention-2와 같은 새로운 학습 기법을 활용하여 모델 성능을 유지하면서 학습 속도를 가속화했습니다. 또한, 1000만 개의 명령어 데이터 샘플을 사용하여 모델을 미세 조정하여 정렬을 강화했습니다. GEB-1.3B는 MMLU, C-Eval, CMMLU와 같은 일반 벤치마크에서 MindLLM-1.3B 및 TinyLLaMA-1.1B와 같은 비교 모델을 능가하는 우수한 성능을 보여줍니다. 특히, GEB-1.3B의 FP32 버전은 CPU에서도 인상적인 추론 시간을 달성하며, 고급 양자화 기술을 통해 속도를 더욱 개선하기 위한 노력이 진행 중입니다. GEB-1.3B를 오픈소스 모델로 공개함으로써 경량 LLM 개발에 중요한 기여를 하며, 해당 분야의 연구와 혁신을 촉진할 것으로 기대됩니다.

English

Recently developed large language models (LLMs) such as ChatGPT, Claude, and Llama have demonstrated impressive abilities, and even surpass human-level performance in several tasks. Despite their success, the resource-intensive demands of these models, requiring significant computational power for both training and inference, limit their deployment to high-performance servers. Additionally, the extensive calculation requirements of the models often lead to increased latency in response times. With the increasing need for LLMs to operate efficiently on CPUs, research about lightweight models that are optimized for CPU inference has emerged. In this work, we introduce GEB-1.3B, a lightweight LLM trained on 550 billion tokens in both Chinese and English languages. We employ novel training techniques, including ROPE, Group-Query-Attention, and FlashAttention-2, to accelerate training while maintaining model performance. Additionally, we fine-tune the model using 10 million samples of instruction data to enhance alignment. GEB-1.3B exhibits outstanding performance on general benchmarks such as MMLU, C-Eval, and CMMLU, outperforming comparative models such as MindLLM-1.3B and TinyLLaMA-1.1B. Notably, the FP32 version of GEB-1.3B achieves commendable inference times on CPUs, with ongoing efforts to further enhance speed through advanced quantization techniques. The release of GEB-1.3B as an open-source model marks a significant contribution to the development of lightweight LLMs, promising to foster further research and innovation in the field.

GEB-1.3B: 오픈 소스 경량 대형 언어 모델

GEB-1.3B: Open Lightweight Large Language Model

초록

Support