Tele-FLM 技术报告
Tele-FLM Technical Report
April 25, 2024
作者: Xiang Li, Yiqun Yao, Xin Jiang, Xuezhi Fang, Chao Wang, Xinzhang Liu, Zihan Wang, Yu Zhao, Xin Wang, Yuyao Huang, Shuangyong Song, Yongxiang Li, Zheng Zhang, Bo Zhao, Aixin Sun, Yequan Wang, Zhongjiang He, Zhongyuan Wang, Xuelong Li, Tiejun Huang
cs.AI
摘要
大型语言模型(LLMs)展示了在语言理解和生成方面的深远能力,促进了各种应用。然而,存在着一个明显的问题,即关于如何有效地扩展超过500亿参数的LLMs,同时最小化试错成本和计算资源的详细开源方法的匮乏。在本报告中,我们介绍了Tele-FLM(又名FLM-2),这是一个拥有520亿参数的开源多语言大型语言模型,具有稳定高效的预训练范式和增强的事实判断能力。Tele-FLM展示了出色的多语言语言建模能力,通过文本语料库上的BPB进行衡量。此外,在英文和中文基础模型评估中,它与涉及更大的预训练FLOPs的强大开源模型(如Llama2-70B和DeepSeek-67B)相媲美。除了模型权重,我们还分享了核心设计、工程实践和训练细节,我们期望这将使学术界和工业界都受益。
English
Large language models (LLMs) have showcased profound capabilities in language
understanding and generation, facilitating a wide array of applications.
However, there is a notable paucity of detailed, open-sourced methodologies on
efficiently scaling LLMs beyond 50 billion parameters with minimum
trial-and-error cost and computational resources. In this report, we introduce
Tele-FLM (aka FLM-2), a 52B open-sourced multilingual large language model that
features a stable, efficient pre-training paradigm and enhanced factual
judgment capabilities. Tele-FLM demonstrates superior multilingual language
modeling abilities, measured by BPB on textual corpus. Besides, in both English
and Chinese foundation model evaluation, it is comparable to strong
open-sourced models that involve larger pre-training FLOPs, such as Llama2-70B
and DeepSeek-67B. In addition to the model weights, we share the core designs,
engineering practices, and training details, which we expect to benefit both
the academic and industrial communities.Summary
AI-Generated Summary