Tele-FLM 기술 보고서

초록

대규모 언어 모델(LLMs)은 언어 이해 및 생성 분야에서 뛰어난 능력을 보여주며 다양한 응용 분야를 가능하게 하고 있습니다. 그러나 500억 개 이상의 파라미터를 효율적으로 확장하는 데 필요한 상세하고 오픈소스화된 방법론은 여전히 부족한 실정이며, 이는 최소한의 시행착오 비용과 계산 자원을 요구합니다. 본 보고서에서는 Tele-FLM(일명 FLM-2)을 소개합니다. 이는 520억 개의 파라미터를 가진 오픈소스 다국어 대규모 언어 모델로, 안정적이고 효율적인 사전 학습 패러다임과 강화된 사실 판단 능력을 특징으로 합니다. Tele-FLM은 텍스트 코퍼스에서의 BPB(Bits Per Byte) 측정을 통해 우수한 다국어 언어 모델링 능력을 입증했습니다. 또한 영어와 중국어 기반 모델 평가에서 Llama2-70B 및 DeepSeek-67B와 같은 더 큰 사전 학습 FLOPs를 사용하는 강력한 오픈소스 모델들과 비교할 만한 성능을 보였습니다. 모델 가중치 외에도, 우리는 핵심 설계, 엔지니어링 사례 및 학습 세부 사항을 공유하며, 이를 통해 학계와 산업계 모두에게 이익이 될 것으로 기대합니다.

English

Large language models (LLMs) have showcased profound capabilities in language understanding and generation, facilitating a wide array of applications. However, there is a notable paucity of detailed, open-sourced methodologies on efficiently scaling LLMs beyond 50 billion parameters with minimum trial-and-error cost and computational resources. In this report, we introduce Tele-FLM (aka FLM-2), a 52B open-sourced multilingual large language model that features a stable, efficient pre-training paradigm and enhanced factual judgment capabilities. Tele-FLM demonstrates superior multilingual language modeling abilities, measured by BPB on textual corpus. Besides, in both English and Chinese foundation model evaluation, it is comparable to strong open-sourced models that involve larger pre-training FLOPs, such as Llama2-70B and DeepSeek-67B. In addition to the model weights, we share the core designs, engineering practices, and training details, which we expect to benefit both the academic and industrial communities.

Tele-FLM 기술 보고서

Tele-FLM Technical Report

초록

Support