ChatPaper.aiChatPaper

MIST:基于监督训练的互信息估计

MIST: Mutual Information Via Supervised Training

November 24, 2025
作者: German Gritsai, Megan Richards, Maxime Méloux, Kyunghyun Cho, Maxime Peyrard
cs.AI

摘要

我们提出了一种全数据驱动的互信息估计器设计方法。鉴于任何互信息估计器都是两个随机变量观测样本的函数,我们采用神经网络(MIST)对该函数进行参数化,并通过端到端训练来预测互信息值。训练基于包含62.5万个已知真实互信息值的合成联合分布元数据集进行。为处理可变样本量与维度,我们采用二维注意力机制确保输入样本的排列不变性。在不确定性量化方面,通过优化分位数回归损失函数,使估计器能够逼近互信息的抽样分布而非返回单一估计值。这一研究方案突破了传统范式,通过全经验化路径以理论普适性换取灵活性与效率。实证表明,学习得到的估计器在不同样本量和维度下均显著优于经典基线方法,包括在训练阶段未出现的联合分布上亦表现优异。基于分位数的置信区间校准良好,比自助法置信区间更可靠,且推理速度较现有神经基线提升数个数量级。除直接实证优势外,该框架可生成可训练、完全可微的估计器,能嵌入更大型学习流程。此外,利用互信息对可逆变换的不变性,可通过标准化流将元数据集适配至任意数据模态,从而为不同目标元分布实现灵活训练。
English
We propose a fully data-driven approach to designing mutual information (MI) estimators. Since any MI estimator is a function of the observed sample from two random variables, we parameterize this function with a neural network (MIST) and train it end-to-end to predict MI values. Training is performed on a large meta-dataset of 625,000 synthetic joint distributions with known ground-truth MI. To handle variable sample sizes and dimensions, we employ a two-dimensional attention scheme ensuring permutation invariance across input samples. To quantify uncertainty, we optimize a quantile regression loss, enabling the estimator to approximate the sampling distribution of MI rather than return a single point estimate. This research program departs from prior work by taking a fully empirical route, trading universal theoretical guarantees for flexibility and efficiency. Empirically, the learned estimators largely outperform classical baselines across sample sizes and dimensions, including on joint distributions unseen during training. The resulting quantile-based intervals are well-calibrated and more reliable than bootstrap-based confidence intervals, while inference is orders of magnitude faster than existing neural baselines. Beyond immediate empirical gains, this framework yields trainable, fully differentiable estimators that can be embedded into larger learning pipelines. Moreover, exploiting MI's invariance to invertible transformations, meta-datasets can be adapted to arbitrary data modalities via normalizing flows, enabling flexible training for diverse target meta-distributions.
PDF92February 7, 2026