此次與眾不同:從可觀測性視角看時間序列基礎模型
This Time is Different: An Observability Perspective on Time Series Foundation Models
May 20, 2025
作者: Ben Cohen, Emaad Khwaja, Youssef Doubli, Salahidine Lemaachi, Chris Lettieri, Charles Masson, Hugo Miccinilli, Elise Ramé, Qiqi Ren, Afshin Rostamizadeh, Jean Ogier du Terrail, Anna-Monica Toon, Kan Wang, Stephan Xie, David Asker, Ameet Talwalkar, Othmane Abou-Amal
cs.AI
摘要
我們推出Toto,這是一個擁有1.51億參數的時間序列預測基礎模型。Toto採用現代僅解碼器架構,並結合了針對多變量可觀測性時間序列數據中特定挑戰設計的架構創新。Toto的預訓練語料庫由可觀測性數據、開放數據集和合成數據混合而成,其規模是領先時間序列基礎模型的4到10倍。此外,我們還引入了BOOM,這是一個大規模基準測試,包含2,807條真實世界時間序列中的3.5億個觀測點。對於Toto和BOOM,我們的可觀測性數據均來自Datadog自身的遙測和內部可觀測性指標。廣泛的評估表明,Toto在BOOM以及已建立的通用時間序列預測基準上均達到了最先進的性能。Toto的模型權重、推理代碼和評估腳本,以及BOOM的數據和評估代碼,均已根據Apache 2.0許可證開源,可通過https://huggingface.co/Datadog/Toto-Open-Base-1.0和https://github.com/DataDog/toto獲取。
English
We introduce Toto, a time series forecasting foundation model with 151
million parameters. Toto uses a modern decoder-only architecture coupled with
architectural innovations designed to account for specific challenges found in
multivariate observability time series data. Toto's pre-training corpus is a
mixture of observability data, open datasets, and synthetic data, and is
4-10times larger than those of leading time series foundation models.
Additionally, we introduce BOOM, a large-scale benchmark consisting of 350
million observations across 2,807 real-world time series. For both Toto and
BOOM, we source observability data exclusively from Datadog's own telemetry and
internal observability metrics. Extensive evaluations demonstrate that Toto
achieves state-of-the-art performance on both BOOM and on established general
purpose time series forecasting benchmarks. Toto's model weights, inference
code, and evaluation scripts, as well as BOOM's data and evaluation code, are
all available as open source under the Apache 2.0 License available at
https://huggingface.co/Datadog/Toto-Open-Base-1.0 and
https://github.com/DataDog/toto.Summary
AI-Generated Summary