通过上下文学习评估时间序列基础模型的迁移能力
Estimating Time Series Foundation Model Transferability via In-Context Learning
September 28, 2025
作者: Qingren Yao, Ming Jin, Chengqi Zhang, Chao-Han Huck Yang, Jun Qi, Shirui Pan
cs.AI
摘要
时序基础模型(TSFMs)通过大规模预训练实现了强大的零样本预测能力,然而在公开数据有限的领域中,微调对于提升性能仍至关重要。随着TSFMs数量的增加,高效识别最适合下游微调的模型变得愈发困难。本研究中,我们提出了TimeTic,一种将模型选择重构为上下文学习问题的可迁移性评估框架:基于已知(源)数据集上的观测,它预测TSFM在下游(目标)数据集微调后的表现。TimeTic灵活地将观测到的模型-数据关系组织为上下文信息,使其能无缝适应各种测试场景。利用由数据集元特征、模型特性及微调性能形成的自然表格结构,我们采用表格基础模型作为上下文学习器。此外,我们引入了一种基于模型层间熵演变的新型模型表征方法,捕捉嵌入空间的差异,使TimeTic能够泛化至任意模型集。我们建立了一个全面的可迁移性评估基准,包含10个数据集、10个基础模型及3种预测任务。在此基准上,TimeTic的评估结果与未见数据集的实际微调性能高度一致,平均秩相关系数约为0.6,相较于使用零样本性能作为可迁移性评分,提升了30%。
English
Time series foundation models (TSFMs) offer strong zero-shot forecasting via
large-scale pre-training, yet fine-tuning remains critical for boosting
performance in domains with limited public data. With the growing number of
TSFMs, efficiently identifying the best model for downstream fine-tuning
becomes increasingly challenging. In this work, we introduce TimeTic, a
transferability estimation framework that recasts model selection as an
in-context-learning problem: given observations on known (source) datasets, it
predicts how a TSFM will perform after fine-tuning on a downstream (target)
dataset. TimeTic flexibly organizes the observed model-data relationships as
contextual information, allowing it to adapt seamlessly to various test-time
scenarios. Leveraging the natural tabular structure formed by dataset
meta-features, model characteristics, and fine-tuned performance, we employ
tabular foundation models to serve as in-context learners. We further introduce
a novel model characterization based on entropy evolution across model layers,
capturing embedding-space distinctions and enabling TimeTic to generalize
across arbitrary model sets. We establish a comprehensive benchmark for
transferability estimation including 10 datasets, 10 foundation models, and 3
forecasting tasks. On this benchmark, TimeTic's estimation demonstrates strong
alignment with actual fine-tuned performance for previously unseen datasets,
achieving a mean rank correlation of approximately 0.6 and a 30% improvement
compared to using zero-shot performance as the transferability score.