VisionTS:視覺遮罩自編碼器是零樣本時間序列預測器
VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters
August 30, 2024
作者: Mouxiang Chen, Lefei Shen, Zhuo Li, Xiaoyun Joy Wang, Jianling Sun, Chenghao Liu
cs.AI
摘要
基礎模型已成為時間序列預測(TSF)中一種具有前景的方法。現有方法要麼對大型語言模型(LLMs)進行微調,要麼構建大規模時間序列數據集以開發TSF基礎模型。然而,這些方法面臨嚴重的跨領域差距或領域內異質性挑戰。本文探索了一條新路徑,從豐富且高質量的自然圖像中構建TSF基礎模型,基於圖像與時間序列之間的內在相似性。為了彌合兩個領域之間的差距,我們將TSF任務重新定義為一個圖像重建任務,進一步通過在ImageNet數據集上進行自監督預訓練的視覺遮罩自編碼器(MAE)進行處理。令人驚訝的是,在沒有進一步在時間序列領域進行適應的情況下,所提出的VisionTS能夠實現優越的零-shot預測性能,相較於現有的TSF基礎模型。通過最小程度的微調,VisionTS可以進一步改善預測並在大多數情況下實現最先進的性能。這些發現表明,視覺模型可能是TSF的一種免費午餐,並突出了未來計算機視覺和TSF之間跨領域研究的潛力。我們的代碼公開可在https://github.com/Keytoyze/VisionTS找到。
English
Foundation models have emerged as a promising approach in time series
forecasting (TSF). Existing approaches either fine-tune large language models
(LLMs) or build large-scale time-series datasets to develop TSF foundation
models. However, these methods face challenges due to the severe cross-domain
gap or in-domain heterogeneity. In this paper, we explore a new road to
building a TSF foundation model from rich and high-quality natural images,
based on the intrinsic similarities between images and time series. To bridge
the gap between the two domains, we reformulate the TSF task as an image
reconstruction task, which is further processed by a visual masked autoencoder
(MAE) self-supervised pre-trained on the ImageNet dataset. Surprisingly,
without further adaptation in the time-series domain, the proposed VisionTS
could achieve superior zero-shot forecasting performance compared to existing
TSF foundation models. With minimal fine-tuning, VisionTS could further improve
the forecasting and achieve state-of-the-art performance in most cases. These
findings suggest that visual models could be a free lunch for TSF and highlight
the potential for future cross-domain research between computer vision and TSF.
Our code is publicly available at https://github.com/Keytoyze/VisionTS.Summary
AI-Generated Summary