时间序列插值的全局局部信息瓶颈

摘要

时间序列插补（TSI）旨在恢复时序数据中的缺失值，由于现实场景中复杂且高频的缺失现象，这一任务始终面临根本性挑战。现有模型通常优化逐点重建损失，侧重于恢复数值（局部信息）。然而，我们观察到，在高缺失率下，这些模型在训练阶段表现良好，但在推理阶段却产生较差的插补结果和扭曲的潜在表示分布（全局信息）。这揭示了一个关键的优化困境：当前目标缺乏全局指导，导致模型过度拟合局部噪声，未能捕捉数据的全局信息。为解决这一问题，我们提出了一种新的训练范式——全局局部信息瓶颈（Glocal-IB）。Glocal-IB与模型无关，通过引入基于可处理互信息近似推导的全局对齐损失，扩展了标准IB框架。该损失将掩码输入的潜在表示与其原始观测对应项对齐，帮助模型在抑制缺失值引起的噪声的同时，保留全局结构和局部细节，从而在高缺失率下实现更好的泛化能力。在九个数据集上的广泛实验证实，Glocal-IB在缺失情况下持续提升了性能并实现了对齐的潜在表示。我们的代码实现可在https://github.com/Muyiiiii/NeurIPS-25-Glocal-IB获取。

English

Time Series Imputation (TSI), which aims to recover missing values in temporal data, remains a fundamental challenge due to the complex and often high-rate missingness in real-world scenarios. Existing models typically optimize the point-wise reconstruction loss, focusing on recovering numerical values (local information). However, we observe that under high missing rates, these models still perform well in the training phase yet produce poor imputations and distorted latent representation distributions (global information) in the inference phase. This reveals a critical optimization dilemma: current objectives lack global guidance, leading models to overfit local noise and fail to capture global information of the data. To address this issue, we propose a new training paradigm, Glocal Information Bottleneck (Glocal-IB). Glocal-IB is model-agnostic and extends the standard IB framework by introducing a Global Alignment loss, derived from a tractable mutual information approximation. This loss aligns the latent representations of masked inputs with those of their originally observed counterparts. It helps the model retain global structure and local details while suppressing noise caused by missing values, giving rise to better generalization under high missingness. Extensive experiments on nine datasets confirm that Glocal-IB leads to consistently improved performance and aligned latent representations under missingness. Our code implementation is available in https://github.com/Muyiiiii/NeurIPS-25-Glocal-IB.

时间序列插值的全局局部信息瓶颈

Glocal Information Bottleneck for Time Series Imputation

摘要

Support