Tele-FLM 技术报告

摘要

大型语言模型（LLMs）展示了在语言理解和生成方面的深远能力，促进了各种应用。然而，存在着一个明显的问题，即关于如何有效地扩展超过500亿参数的LLMs，同时最小化试错成本和计算资源的详细开源方法的匮乏。在本报告中，我们介绍了Tele-FLM（又名FLM-2），这是一个拥有520亿参数的开源多语言大型语言模型，具有稳定高效的预训练范式和增强的事实判断能力。Tele-FLM展示了出色的多语言语言建模能力，通过文本语料库上的BPB进行衡量。此外，在英文和中文基础模型评估中，它与涉及更大的预训练FLOPs的强大开源模型（如Llama2-70B和DeepSeek-67B）相媲美。除了模型权重，我们还分享了核心设计、工程实践和训练细节，我们期望这将使学术界和工业界都受益。

English

Large language models (LLMs) have showcased profound capabilities in language understanding and generation, facilitating a wide array of applications. However, there is a notable paucity of detailed, open-sourced methodologies on efficiently scaling LLMs beyond 50 billion parameters with minimum trial-and-error cost and computational resources. In this report, we introduce Tele-FLM (aka FLM-2), a 52B open-sourced multilingual large language model that features a stable, efficient pre-training paradigm and enhanced factual judgment capabilities. Tele-FLM demonstrates superior multilingual language modeling abilities, measured by BPB on textual corpus. Besides, in both English and Chinese foundation model evaluation, it is comparable to strong open-sourced models that involve larger pre-training FLOPs, such as Llama2-70B and DeepSeek-67B. In addition to the model weights, we share the core designs, engineering practices, and training details, which we expect to benefit both the academic and industrial communities.

Tele-FLM 技术报告

Tele-FLM Technical Report

摘要

Support