基於語言嵌入的時間序列分類技術:LETS-C
LETS-C: Leveraging Language Embedding for Time Series Classification
July 9, 2024
作者: Rachneet Kaur, Zhen Zeng, Tucker Balch, Manuela Veloso
cs.AI
摘要
最近在語言建模方面的進展已經顯示出當應用於時間序列數據時具有令人期待的結果。特別是,對預先訓練的大型語言模型(LLMs)進行微調,用於時間序列分類任務已經在標準基準測試中實現了最先進的性能。然而,這些基於LLM的模型存在一個顯著的缺點,即由於模型大小龐大,可訓練參數數量達到百萬級。在本文中,我們提出了一種利用語言建模在時間序列領域取得成功的替代方法。我們並沒有對LLMs進行微調,而是利用一個語言嵌入模型來嵌入時間序列,然後將這些嵌入與由卷積神經網絡(CNN)和多層感知器(MLP)組成的簡單分類頭進行配對。我們對眾所周知的時間序列分類基準數據集進行了大量實驗。我們展示了LETS-C不僅在分類準確度方面優於當前的最先進技術,而且提供了一種輕量級解決方案,平均僅使用了SOTA模型可訓練參數的14.5%。我們的研究結果表明,利用語言編碼器將時間序列數據嵌入,結合一個簡單但有效的分類頭,為實現高性能時間序列分類提供了一個具有前景的方向,同時保持輕量級模型結構。
English
Recent advancements in language modeling have shown promising results when
applied to time series data. In particular, fine-tuning pre-trained large
language models (LLMs) for time series classification tasks has achieved
state-of-the-art (SOTA) performance on standard benchmarks. However, these
LLM-based models have a significant drawback due to the large model size, with
the number of trainable parameters in the millions. In this paper, we propose
an alternative approach to leveraging the success of language modeling in the
time series domain. Instead of fine-tuning LLMs, we utilize a language
embedding model to embed time series and then pair the embeddings with a simple
classification head composed of convolutional neural networks (CNN) and
multilayer perceptron (MLP). We conducted extensive experiments on
well-established time series classification benchmark datasets. We demonstrated
LETS-C not only outperforms the current SOTA in classification accuracy but
also offers a lightweight solution, using only 14.5% of the trainable
parameters on average compared to the SOTA model. Our findings suggest that
leveraging language encoders to embed time series data, combined with a simple
yet effective classification head, offers a promising direction for achieving
high-performance time series classification while maintaining a lightweight
model architecture.Summary
AI-Generated Summary