GEB-1.3B:開放式輕量級大型語言模型
GEB-1.3B: Open Lightweight Large Language Model
June 14, 2024
作者: Jie Wu, Yufeng Zhu, Lei Shen, Xuqing Lu
cs.AI
摘要
最近開發的大型語言模型(LLMs)如ChatGPT、Claude和Llama展示了令人印象深刻的能力,甚至在幾項任務中超越了人類水平的表現。儘管取得成功,這些模型對資源的需求巨大,需要大量的計算資源進行訓練和推理,限制了它們僅能部署在高性能伺服器上。此外,模型的廣泛計算需求通常導致回應時間的延遲增加。隨著對LLMs在CPU上高效運行的需求增加,出現了針對CPU推理進行優化的輕量級模型的研究。在這項工作中,我們介紹了GEB-1.3B,一個在中文和英文語言中訓練了5500億標記的輕量級LLM。我們採用了新穎的訓練技術,包括ROPE、Group-Query-Attention和FlashAttention-2,以加速訓練同時保持模型性能。此外,我們使用1000萬條指令數據樣本對模型進行微調以增強對齊。GEB-1.3B在MMLU、C-Eval和CMMLU等通用基準測試中表現優異,勝過MindLLM-1.3B和TinyLLaMA-1.1B等對比模型。值得注意的是,GEB-1.3B的FP32版本在CPU上實現了可觀的推理時間,並通過先進的量化技術不斷努力進一步提高速度。GEB-1.3B作為一個開源模型的釋出對輕量級LLMs的發展做出了重要貢獻,有望促進該領域進一步的研究和創新。
English
Recently developed large language models (LLMs) such as ChatGPT, Claude, and
Llama have demonstrated impressive abilities, and even surpass human-level
performance in several tasks. Despite their success, the resource-intensive
demands of these models, requiring significant computational power for both
training and inference, limit their deployment to high-performance servers.
Additionally, the extensive calculation requirements of the models often lead
to increased latency in response times. With the increasing need for LLMs to
operate efficiently on CPUs, research about lightweight models that are
optimized for CPU inference has emerged. In this work, we introduce GEB-1.3B, a
lightweight LLM trained on 550 billion tokens in both Chinese and English
languages. We employ novel training techniques, including ROPE,
Group-Query-Attention, and FlashAttention-2, to accelerate training while
maintaining model performance. Additionally, we fine-tune the model using 10
million samples of instruction data to enhance alignment. GEB-1.3B exhibits
outstanding performance on general benchmarks such as MMLU, C-Eval, and CMMLU,
outperforming comparative models such as MindLLM-1.3B and TinyLLaMA-1.1B.
Notably, the FP32 version of GEB-1.3B achieves commendable inference times on
CPUs, with ongoing efforts to further enhance speed through advanced
quantization techniques. The release of GEB-1.3B as an open-source model marks
a significant contribution to the development of lightweight LLMs, promising to
foster further research and innovation in the field.Summary
AI-Generated Summary