Timer-S1:基于序列缩放的十亿级时序基础模型
Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling
March 5, 2026
作者: Yong Liu, Xingjian Su, Shiyu Wang, Haoran Zhang, Haixuan Liu, Yuxuan Wang, Zhou Ye, Yang Xiang, Jianmin Wang, Mingsheng Long
cs.AI
摘要
我们推出Timer-S1——一款强大的专家混合(MoE)时间序列基础模型,其总参数量达83亿,每个令牌激活参数为7.5亿,上下文长度达11.5K。为突破现有预训练时间序列基础模型的可扩展性瓶颈,我们在模型架构、数据集和训练流程三个维度实施序列化扩展策略。该模型集成稀疏TimeMoE模块与通用TimeSTP模块,通过序列令牌预测(STP)这一符合预测序列特性的通用训练目标,引入序列计算以提升长期预测能力,同时避免传统逐令牌预测中耗能的滚动式推理与显著误差累积。为构建高质量无偏差训练数据集,我们精心整理包含万亿时间点的TimeBench语料库,并应用精细数据增强技术以缓解预测偏差。我们进一步开创包含持续预训练与长上下文扩展的后训练阶段,显著提升短期与长上下文场景性能。在大规模GIFT-Eval评测榜单中,Timer-S1作为预训练模型取得最优MASE与CRPS评分,实现业界领先的预测性能。Timer-S1将开源发布以促进后续研究。
English
We introduce Timer-S1, a strong Mixture-of-Experts (MoE) time series foundation model with 8.3B total parameters, 0.75B activated parameters for each token, and a context length of 11.5K. To overcome the scalability bottleneck in existing pre-trained time series foundation models, we perform Serial Scaling in three dimensions: model architecture, dataset, and training pipeline. Timer-S1 integrates sparse TimeMoE blocks and generic TimeSTP blocks for Serial-Token Prediction (STP), a generic training objective that adheres to the serial nature of forecasting. The proposed paradigm introduces serial computations to improve long-term predictions while avoiding costly rolling-style inference and pronounced error accumulation in the standard next-token prediction. Pursuing a high-quality and unbiased training dataset, we curate TimeBench, a corpus with one trillion time points, and apply meticulous data augmentation to mitigate predictive bias. We further pioneer a post-training stage, including continued pre-training and long-context extension, to enhance short-term and long-context performance. Evaluated on the large-scale GIFT-Eval leaderboard, Timer-S1 achieves state-of-the-art forecasting performance, attaining the best MASE and CRPS scores as a pre-trained model. Timer-S1 will be released to facilitate further research.