Timer-S1：基于序列缩放的十亿级时序基础模型

摘要

我们推出Timer-S1——一款强大的专家混合（MoE）时间序列基础模型，其总参数量达83亿，每个令牌激活参数为7.5亿，上下文长度达11.5K。为突破现有预训练时间序列基础模型的可扩展性瓶颈，我们在模型架构、数据集和训练流程三个维度实施序列化扩展策略。该模型集成稀疏TimeMoE模块与通用TimeSTP模块，通过序列令牌预测（STP）这一符合预测序列特性的通用训练目标，引入序列计算以提升长期预测能力，同时避免传统逐令牌预测中耗能的滚动式推理与显著误差累积。为构建高质量无偏差训练数据集，我们精心整理包含万亿时间点的TimeBench语料库，并应用精细数据增强技术以缓解预测偏差。我们进一步开创包含持续预训练与长上下文扩展的后训练阶段，显著提升短期与长上下文场景性能。在大规模GIFT-Eval评测榜单中，Timer-S1作为预训练模型取得最优MASE与CRPS评分，实现业界领先的预测性能。Timer-S1将开源发布以促进后续研究。

English

We introduce Timer-S1, a strong Mixture-of-Experts (MoE) time series foundation model with 8.3B total parameters, 0.75B activated parameters for each token, and a context length of 11.5K. To overcome the scalability bottleneck in existing pre-trained time series foundation models, we perform Serial Scaling in three dimensions: model architecture, dataset, and training pipeline. Timer-S1 integrates sparse TimeMoE blocks and generic TimeSTP blocks for Serial-Token Prediction (STP), a generic training objective that adheres to the serial nature of forecasting. The proposed paradigm introduces serial computations to improve long-term predictions while avoiding costly rolling-style inference and pronounced error accumulation in the standard next-token prediction. Pursuing a high-quality and unbiased training dataset, we curate TimeBench, a corpus with one trillion time points, and apply meticulous data augmentation to mitigate predictive bias. We further pioneer a post-training stage, including continued pre-training and long-context extension, to enhance short-term and long-context performance. Evaluated on the large-scale GIFT-Eval leaderboard, Timer-S1 achieves state-of-the-art forecasting performance, attaining the best MASE and CRPS scores as a pre-trained model. Timer-S1 will be released to facilitate further research.

Timer-S1：基于序列缩放的十亿级时序基础模型

Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling

摘要

Support