TeleChat2、TeleChat2.5及T1技术报告

摘要

我們隆重推出最新系列的TeleChat模型：TeleChat2、TeleChat2.5及T1，這些模型相較於前代TeleChat實現了顯著的性能提升。儘管模型架構僅作了微調，但通過在預訓練與後訓練階段採用的強化訓練策略，新系列模型取得了實質性的進步。該系列首發的TeleChat2，經過了對10萬億高質量且多樣化語料的預訓練，隨後進行了監督微調（SFT）與直接偏好優化（DPO），以進一步提升其能力。TeleChat2.5和T1則在流程中增加了針對特定領域數據集的持續預訓練階段，並結合強化學習（RL）技術，以增強在代碼生成與數學推理任務中的表現。其中，T1版本專為複雜推理設計，支持長鏈式思維（CoT）推理，在數學與編程方面展現出顯著改進；而TeleChat2.5則側重於速度，提供快速推理能力。T1與TeleChat2.5這兩款旗艦模型均基於密集Transformer架構，擁有1150億參數，相比原版TeleChat，在推理與通用任務性能上均有重大突破。值得一提的是，T1-115B在性能上超越了如OpenAI的o1-mini及GPT-4o等專有模型。我們公開釋出TeleChat2、TeleChat2.5及T1，包括擁有350億與1150億參數的後訓練版本，旨在為開發者與研究人員提供針對多樣化應用場景量身定制的前沿語言模型。

English

We introduce the latest series of TeleChat models: TeleChat2, TeleChat2.5, and T1, offering a significant upgrade over their predecessor, TeleChat. Despite minimal changes to the model architecture, the new series achieves substantial performance gains through enhanced training strategies in both pre-training and post-training stages. The series begins with TeleChat2, which undergoes pretraining on 10 trillion high-quality and diverse tokens. This is followed by Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) to further enhance its capabilities. TeleChat2.5 and T1 expand the pipeline by incorporating a continual pretraining phase with domain-specific datasets, combined with reinforcement learning (RL) to improve performance in code generation and mathematical reasoning tasks. The T1 variant is designed for complex reasoning, supporting long Chain-of-Thought (CoT) reasoning and demonstrating substantial improvements in mathematics and coding. In contrast, TeleChat2.5 prioritizes speed, delivering rapid inference. Both flagship models of T1 and TeleChat2.5 are dense Transformer-based architectures with 115B parameters, showcasing significant advancements in reasoning and general task performance compared to the original TeleChat. Notably, T1-115B outperform proprietary models such as OpenAI's o1-mini and GPT-4o. We publicly release TeleChat2, TeleChat2.5 and T1, including post-trained versions with 35B and 115B parameters, to empower developers and researchers with state-of-the-art language models tailored for diverse applications.

TeleChat2、TeleChat2.5及T1技术报告

Technical Report of TeleChat2, TeleChat2.5 and T1

摘要

Support