基於語言嵌入的時間序列分類技術：LETS-C

摘要

最近在語言建模方面的進展已經顯示出當應用於時間序列數據時具有令人期待的結果。特別是，對預先訓練的大型語言模型（LLMs）進行微調，用於時間序列分類任務已經在標準基準測試中實現了最先進的性能。然而，這些基於LLM的模型存在一個顯著的缺點，即由於模型大小龐大，可訓練參數數量達到百萬級。在本文中，我們提出了一種利用語言建模在時間序列領域取得成功的替代方法。我們並沒有對LLMs進行微調，而是利用一個語言嵌入模型來嵌入時間序列，然後將這些嵌入與由卷積神經網絡（CNN）和多層感知器（MLP）組成的簡單分類頭進行配對。我們對眾所周知的時間序列分類基準數據集進行了大量實驗。我們展示了LETS-C不僅在分類準確度方面優於當前的最先進技術，而且提供了一種輕量級解決方案，平均僅使用了SOTA模型可訓練參數的14.5％。我們的研究結果表明，利用語言編碼器將時間序列數據嵌入，結合一個簡單但有效的分類頭，為實現高性能時間序列分類提供了一個具有前景的方向，同時保持輕量級模型結構。

English

Recent advancements in language modeling have shown promising results when applied to time series data. In particular, fine-tuning pre-trained large language models (LLMs) for time series classification tasks has achieved state-of-the-art (SOTA) performance on standard benchmarks. However, these LLM-based models have a significant drawback due to the large model size, with the number of trainable parameters in the millions. In this paper, we propose an alternative approach to leveraging the success of language modeling in the time series domain. Instead of fine-tuning LLMs, we utilize a language embedding model to embed time series and then pair the embeddings with a simple classification head composed of convolutional neural networks (CNN) and multilayer perceptron (MLP). We conducted extensive experiments on well-established time series classification benchmark datasets. We demonstrated LETS-C not only outperforms the current SOTA in classification accuracy but also offers a lightweight solution, using only 14.5% of the trainable parameters on average compared to the SOTA model. Our findings suggest that leveraging language encoders to embed time series data, combined with a simple yet effective classification head, offers a promising direction for achieving high-performance time series classification while maintaining a lightweight model architecture.

基於語言嵌入的時間序列分類技術：LETS-C

LETS-C: Leveraging Language Embedding for Time Series Classification

摘要

Support