透過上下文學習評估時間序列基礎模型的遷移能力
Estimating Time Series Foundation Model Transferability via In-Context Learning
September 28, 2025
作者: Qingren Yao, Ming Jin, Chengqi Zhang, Chao-Han Huck Yang, Jun Qi, Shirui Pan
cs.AI
摘要
時間序列基礎模型(TSFMs)通過大規模預訓練提供了強大的零樣本預測能力,然而在公開數據有限的領域中,微調對於提升性能仍然至關重要。隨著TSFMs數量的增加,高效識別最適合下游微調的模型變得愈發具有挑戰性。在本研究中,我們提出了TimeTic,這是一個將模型選擇重新定義為上下文學習問題的可遷移性估計框架:基於已知(源)數據集的觀察,它預測TSFM在下游(目標)數據集上微調後的表現。TimeTic靈活地將觀察到的模型-數據關係組織為上下文信息,使其能夠無縫適應各種測試場景。利用數據集元特徵、模型特性和微調性能形成的自然表格結構,我們採用表格基礎模型作為上下文學習器。我們進一步引入了一種基於模型層間熵演化的新穎模型表徵方法,捕捉嵌入空間的差異,使TimeTic能夠泛化到任意模型集。我們建立了一個全面的可遷移性估計基準,包括10個數據集、10個基礎模型和3種預測任務。在此基準上,TimeTic的估計與實際微調性能在未見數據集上展現出高度一致性,平均秩相關性約為0.6,相比使用零樣本性能作為可遷移性評分,提升了30%。
English
Time series foundation models (TSFMs) offer strong zero-shot forecasting via
large-scale pre-training, yet fine-tuning remains critical for boosting
performance in domains with limited public data. With the growing number of
TSFMs, efficiently identifying the best model for downstream fine-tuning
becomes increasingly challenging. In this work, we introduce TimeTic, a
transferability estimation framework that recasts model selection as an
in-context-learning problem: given observations on known (source) datasets, it
predicts how a TSFM will perform after fine-tuning on a downstream (target)
dataset. TimeTic flexibly organizes the observed model-data relationships as
contextual information, allowing it to adapt seamlessly to various test-time
scenarios. Leveraging the natural tabular structure formed by dataset
meta-features, model characteristics, and fine-tuned performance, we employ
tabular foundation models to serve as in-context learners. We further introduce
a novel model characterization based on entropy evolution across model layers,
capturing embedding-space distinctions and enabling TimeTic to generalize
across arbitrary model sets. We establish a comprehensive benchmark for
transferability estimation including 10 datasets, 10 foundation models, and 3
forecasting tasks. On this benchmark, TimeTic's estimation demonstrates strong
alignment with actual fine-tuned performance for previously unseen datasets,
achieving a mean rank correlation of approximately 0.6 and a 30% improvement
compared to using zero-shot performance as the transferability score.