基于连续参数预测整数

摘要

我们研究的是预测整数值或整数子区间数值标签的问题，例如社交媒体帖子的点赞数，或公共租赁站可用自行车数量。虽然可将此类问题建模为连续值并应用传统回归方法，但该做法会将标签的底层分布从离散型转为连续型。离散分布具有特定优势，这引出一个问题：能否通过离散分布直接建模此类整数标签，并基于给定实例的特征预测分布参数？此外，我们聚焦神经网络输出分布的应用场景，这要求分布参数必须连续，以便通过反向传播和梯度下降学习网络权重。我们探究了若干适用于该场景的分布方案（含现有方案与新方案），并在表格学习、序列预测和图像生成等任务中进行了测试。研究发现，总体性能最佳的是两种分布：一种是比特分布（通过比特位表示目标整数并对每位施加伯努利分布），另一种是拉普拉斯分布的离散模拟（采用在连续均值周围呈指数衰减尾部的分布）。

English

We study the problem of predicting numeric labels that are constrained to the integers or to a subrange of the integers. For example, the number of up-votes on social media posts, or the number of bicycles available at a public rental station. While it is possible to model these as continuous values, and to apply traditional regression, this approach changes the underlying distribution on the labels from discrete to continuous. Discrete distributions have certain benefits, which leads us to the question whether such integer labels can be modeled directly by a discrete distribution, whose parameters are predicted from the features of a given instance. Moreover, we focus on the use case of output distributions of neural networks, which adds the requirement that the parameters of the distribution be continuous so that backpropagation and gradient descent may be used to learn the weights of the network. We investigate several options for such distributions, some existing and some novel, and test them on a range of tasks, including tabular learning, sequential prediction and image generation. We find that overall the best performance comes from two distributions: Bitwise, which represents the target integer in bits and places a Bernoulli distribution on each, and a discrete analogue of the Laplace distribution, which uses a distribution with exponentially decaying tails around a continuous mean.