電訊-FLM 技術報告
Tele-FLM Technical Report
April 25, 2024
作者: Xiang Li, Yiqun Yao, Xin Jiang, Xuezhi Fang, Chao Wang, Xinzhang Liu, Zihan Wang, Yu Zhao, Xin Wang, Yuyao Huang, Shuangyong Song, Yongxiang Li, Zheng Zhang, Bo Zhao, Aixin Sun, Yequan Wang, Zhongjiang He, Zhongyuan Wang, Xuelong Li, Tiejun Huang
cs.AI
摘要
大型語言模型(LLMs)展示了在語言理解和生成方面的深遠能力,促進了各種應用。然而,在有效地擴展LLMs超過500億參數的詳細、開源方法方面存在明顯的不足,並且需要最小的試錯成本和計算資源。在本報告中,我們介紹了Tele-FLM(又名FLM-2),這是一個擁有520億參數的開源多語言大型語言模型,具有穩定、高效的預訓練範式和增強的事實判斷能力。Tele-FLM展示了優越的多語言語言建模能力,通過文本語料庫上的BPB進行衡量。此外,在英文和中文基礎模型評估中,它與涉及更大預訓練FLOPs的強開源模型(如Llama2-70B和DeepSeek-67B)相當。除了模型權重,我們還分享了核心設計、工程實踐和訓練細節,我們期望這將使學術界和工業界都受益。
English
Large language models (LLMs) have showcased profound capabilities in language
understanding and generation, facilitating a wide array of applications.
However, there is a notable paucity of detailed, open-sourced methodologies on
efficiently scaling LLMs beyond 50 billion parameters with minimum
trial-and-error cost and computational resources. In this report, we introduce
Tele-FLM (aka FLM-2), a 52B open-sourced multilingual large language model that
features a stable, efficient pre-training paradigm and enhanced factual
judgment capabilities. Tele-FLM demonstrates superior multilingual language
modeling abilities, measured by BPB on textual corpus. Besides, in both English
and Chinese foundation model evaluation, it is comparable to strong
open-sourced models that involve larger pre-training FLOPs, such as Llama2-70B
and DeepSeek-67B. In addition to the model weights, we share the core designs,
engineering practices, and training details, which we expect to benefit both
the academic and industrial communities.Summary
AI-Generated Summary