Voorbij holistische modellen: systematische benchmarking op componentniveau van diepe multivariate tijdreeksprognoses

Samenvatting

Hoewel eerder onderzoek naar multivariate tijdreeksvoorspelling zich richtte op het ontwikkelen van complexe holistische modellen, pleit dit werk voor een verschuiving naar een granulair, componentgericht begrip van hun impact. Wij introduceren TSCOMP, de eerste grootschalige benchmark die diepe voorspellingsmethoden systematisch ontleedt in hun kerncomponenten op fijnmazig niveau – variërend van serie-voorbewerking, coderingsstrategieën, netwerkarchitecturen (waaronder specifieke en grootschalige tijdreeksmodellen) en optimalisatiemethoden. Door gebruik te maken van een beperkt orthogonaal experimenteel ontwerp en uitgebreide evaluaties voeren we multi-view analyses uit die de effectiviteit van componenten in verschillende backbone-modellen, data-eigenschappen en hun interacties blootleggen. Naast het bieden van inzichten, creëert deze benchmark een fijnmazig prestatielichaam met meer dan 20.000 model-dataset-evaluaties, dat het leren van geautomatiseerde componentselectie ondersteunt en zero-shot modelconstructie op nieuwe datasets mogelijk maakt. Onze experimenten tonen aan dat de corpus-gedreven aanpak, ondanks zijn eenvoud, consistent beter presteert dan de nieuwste methoden, wat de validiteit van ons evaluatieontwerp bevestigt en aantoont dat systematische componentselectie handmatig ontworpen complexe architecturen overtreft. Alle code en het prestatielichaam zijn openbaar beschikbaar op https://github.com/SUFE-AILAB/TSCOMP.

English

While previous research in multivariate time series forecasting has focused on developing complex holistic models, this work advocates for a shift toward a granular, component-level understanding of their impacts. We propose TSCOMP, the first large-scale benchmark that systematically deconstructs deep forecasting methods into their core, fine-grained components--spanning series preprocessing, encoding strategies, network architectures including specific and large time-series models, and optimization methods. Using constrained orthogonal experimental design and extensive evaluations, we conduct multi-view analyses that reveal component effectiveness across different backbones, data characteristics, and their interactions. Beyond providing insights, this benchmark establishes a fine-grained performance corpus comprising over 20,000 model-dataset evaluations, which supports the learning of automated component selection, enabling zero-shot model construction on new datasets. Our experiments demonstrate that the corpus-driven approach, despite its simplicity, consistently outperforms state-of-the-art methods, validating the soundness of our evaluation design and confirming that systematic component selection surpasses manually designed complex architectures. All code and the performance corpus are publicly available at https://github.com/SUFE-AILAB/TSCOMP.