ChatPaper.aiChatPaper

MIST:基于监督训练的双向信息交互

MIST: Mutual Information Via Supervised Training

November 24, 2025
作者: German Gritsai, Megan Richards, Maxime Méloux, Kyunghyun Cho, Maxime Peyrard
cs.AI

摘要

我们提出了一种完全数据驱动的互信息估计器设计方法。鉴于任何互信息估计器都是两个随机变量观测样本的函数,我们通过神经网络(MIST)对该函数进行参数化,并以端到端方式训练其预测互信息值。训练基于包含62.5万个已知真实互信息值的合成联合分布元数据集进行。为处理可变样本量与维度,我们采用二维注意力机制确保输入样本的置换不变性。通过优化分位数回归损失函数,估计器能够逼近互信息的采样分布而非返回单一估计值,从而实现不确定性量化。本研究方案采用完全经验化路径,以通用理论保证换取灵活性与效率,这与前人工作形成显著区别。实验表明,学习得到的估计器在不同样本量、维度及训练时未见的联合分布上均显著超越经典基线方法。基于分位数的置信区间校准良好,比自助法置信区间更可靠,且推理速度较现有神经基线快数个数量级。除直接实证优势外,该框架可生成可训练的全微分估计器,能嵌入更大型学习流水线。利用互信息对可逆变换的不变性特性,通过标准化流可使元数据集适配任意数据模态,从而为多样化目标元分布提供灵活训练方案。
English
We propose a fully data-driven approach to designing mutual information (MI) estimators. Since any MI estimator is a function of the observed sample from two random variables, we parameterize this function with a neural network (MIST) and train it end-to-end to predict MI values. Training is performed on a large meta-dataset of 625,000 synthetic joint distributions with known ground-truth MI. To handle variable sample sizes and dimensions, we employ a two-dimensional attention scheme ensuring permutation invariance across input samples. To quantify uncertainty, we optimize a quantile regression loss, enabling the estimator to approximate the sampling distribution of MI rather than return a single point estimate. This research program departs from prior work by taking a fully empirical route, trading universal theoretical guarantees for flexibility and efficiency. Empirically, the learned estimators largely outperform classical baselines across sample sizes and dimensions, including on joint distributions unseen during training. The resulting quantile-based intervals are well-calibrated and more reliable than bootstrap-based confidence intervals, while inference is orders of magnitude faster than existing neural baselines. Beyond immediate empirical gains, this framework yields trainable, fully differentiable estimators that can be embedded into larger learning pipelines. Moreover, exploiting MI's invariance to invertible transformations, meta-datasets can be adapted to arbitrary data modalities via normalizing flows, enabling flexible training for diverse target meta-distributions.
PDF92February 7, 2026