TeleChat2、TeleChat2.5およびT1の技術報告書

要旨

最新のTeleChatモデルシリーズであるTeleChat2、TeleChat2.5、およびT1を紹介します。これらは前身のTeleChatを大幅にアップグレードしたものです。モデルアーキテクチャの変更は最小限ながら、新しいシリーズは事前学習と事後学習の両段階における強化されたトレーニング戦略を通じて、大幅な性能向上を実現しています。シリーズは、10兆の高品質で多様なトークンで事前学習されたTeleChat2から始まります。その後、教師ありファインチューニング（SFT）と直接選好最適化（DPO）を経て、その能力をさらに強化します。TeleChat2.5とT1は、ドメイン固有のデータセットを用いた継続的な事前学習フェーズを組み込み、コード生成や数学的推論タスクの性能を向上させるために強化学習（RL）を採用しています。T1バリアントは複雑な推論に特化して設計されており、長い連鎖思考（CoT）推論をサポートし、数学とコーディングにおいて大幅な改善を示します。一方、TeleChat2.5は速度を優先し、迅速な推論を実現します。T1とTeleChat2.5の両フラッグシップモデルは、115Bパラメータを持つ密なTransformerベースのアーキテクチャで、元のTeleChatと比較して推論と一般的なタスク性能において大きな進歩を示しています。特に、T1-115BはOpenAIのo1-miniやGPT-4oなどのプロプライエタリモデルを凌駕しています。我々は、35Bと115Bパラメータを持つ事後学習バージョンを含むTeleChat2、TeleChat2.5、およびT1を公開し、多様なアプリケーションに適した最先端の言語モデルを開発者や研究者に提供します。

English

We introduce the latest series of TeleChat models: TeleChat2, TeleChat2.5, and T1, offering a significant upgrade over their predecessor, TeleChat. Despite minimal changes to the model architecture, the new series achieves substantial performance gains through enhanced training strategies in both pre-training and post-training stages. The series begins with TeleChat2, which undergoes pretraining on 10 trillion high-quality and diverse tokens. This is followed by Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) to further enhance its capabilities. TeleChat2.5 and T1 expand the pipeline by incorporating a continual pretraining phase with domain-specific datasets, combined with reinforcement learning (RL) to improve performance in code generation and mathematical reasoning tasks. The T1 variant is designed for complex reasoning, supporting long Chain-of-Thought (CoT) reasoning and demonstrating substantial improvements in mathematics and coding. In contrast, TeleChat2.5 prioritizes speed, delivering rapid inference. Both flagship models of T1 and TeleChat2.5 are dense Transformer-based architectures with 115B parameters, showcasing significant advancements in reasoning and general task performance compared to the original TeleChat. Notably, T1-115B outperform proprietary models such as OpenAI's o1-mini and GPT-4o. We publicly release TeleChat2, TeleChat2.5 and T1, including post-trained versions with 35B and 115B parameters, to empower developers and researchers with state-of-the-art language models tailored for diverse applications.

TeleChat2、TeleChat2.5およびT1の技術報告書

Technical Report of TeleChat2, TeleChat2.5 and T1

要旨

Support