LETS-C: 時系列分類のための言語埋め込みの活用

要旨

近年の言語モデリングの進歩は、時系列データに適用した際に有望な結果を示しています。特に、事前学習済みの大規模言語モデル（LLM）を時系列分類タスクにファインチューニングすることで、標準的なベンチマークにおいて最先端（SOTA）の性能を達成しています。しかし、これらのLLMベースのモデルは、モデルサイズが大きく、学習可能なパラメータ数が数百万に及ぶという重大な欠点があります。本論文では、時系列領域における言語モデリングの成功を活用するための代替アプローチを提案します。LLMをファインチューニングする代わりに、言語埋め込みモデルを用いて時系列を埋め込み、その埋め込みを畳み込みニューラルネットワーク（CNN）と多層パーセプトロン（MLP）で構成されたシンプルな分類ヘッドと組み合わせます。確立された時系列分類ベンチマークデータセットを用いて広範な実験を行い、LETS-Cが分類精度において現在のSOTAを上回るだけでなく、SOTAモデルと比較して平均で学習可能なパラメータ数の14.5%しか使用しない軽量なソリューションを提供することを実証しました。我々の研究結果は、言語エンコーダを活用して時系列データを埋め込み、シンプルでありながら効果的な分類ヘッドと組み合わせることが、軽量なモデルアーキテクチャを維持しながら高性能な時系列分類を実現するための有望な方向性であることを示唆しています。

English

Recent advancements in language modeling have shown promising results when applied to time series data. In particular, fine-tuning pre-trained large language models (LLMs) for time series classification tasks has achieved state-of-the-art (SOTA) performance on standard benchmarks. However, these LLM-based models have a significant drawback due to the large model size, with the number of trainable parameters in the millions. In this paper, we propose an alternative approach to leveraging the success of language modeling in the time series domain. Instead of fine-tuning LLMs, we utilize a language embedding model to embed time series and then pair the embeddings with a simple classification head composed of convolutional neural networks (CNN) and multilayer perceptron (MLP). We conducted extensive experiments on well-established time series classification benchmark datasets. We demonstrated LETS-C not only outperforms the current SOTA in classification accuracy but also offers a lightweight solution, using only 14.5% of the trainable parameters on average compared to the SOTA model. Our findings suggest that leveraging language encoders to embed time series data, combined with a simple yet effective classification head, offers a promising direction for achieving high-performance time series classification while maintaining a lightweight model architecture.

LETS-C: 時系列分類のための言語埋め込みの活用

LETS-C: Leveraging Language Embedding for Time Series Classification

要旨

Support