Tele-FLM技術報告書

要旨

大規模言語モデル（LLMs）は、言語理解と生成において深い能力を示し、多様なアプリケーションを促進してきました。しかし、50億パラメータを超えるLLMsを効率的にスケーリングするための詳細でオープンソースの方法論は、試行錯誤のコストと計算リソースを最小限に抑える観点から、顕著に不足しています。本報告では、Tele-FLM（別名FLM-2）を紹介します。これは52Bのオープンソース多言語大規模言語モデルで、安定かつ効率的な事前学習パラダイムと強化された事実判断能力を特徴としています。Tele-FLMは、テキストコーパスにおけるBPBで測定される優れた多言語言語モデリング能力を示しています。さらに、英語と中国語の基盤モデル評価においても、Llama2-70BやDeepSeek-67Bなど、より大規模な事前学習FLOPsを伴う強力なオープンソースモデルに匹敵する性能を発揮します。モデルウェイトに加えて、コアデザイン、エンジニアリングプラクティス、およびトレーニングの詳細を共有し、これが学術界と産業界の両方に利益をもたらすことを期待しています。

English

Large language models (LLMs) have showcased profound capabilities in language understanding and generation, facilitating a wide array of applications. However, there is a notable paucity of detailed, open-sourced methodologies on efficiently scaling LLMs beyond 50 billion parameters with minimum trial-and-error cost and computational resources. In this report, we introduce Tele-FLM (aka FLM-2), a 52B open-sourced multilingual large language model that features a stable, efficient pre-training paradigm and enhanced factual judgment capabilities. Tele-FLM demonstrates superior multilingual language modeling abilities, measured by BPB on textual corpus. Besides, in both English and Chinese foundation model evaluation, it is comparable to strong open-sourced models that involve larger pre-training FLOPs, such as Llama2-70B and DeepSeek-67B. In addition to the model weights, we share the core designs, engineering practices, and training details, which we expect to benefit both the academic and industrial communities.

Tele-FLM技術報告書

Tele-FLM Technical Report

要旨

Support