Prevendo números inteiros a partir de parâmetros contínuos

Resumo

Estudamos o problema de prever rótulos numéricos que estão restritos aos inteiros ou a um subconjunto dos inteiros. Por exemplo, o número de votos positivos em publicações de redes sociais ou o número de bicicletas disponíveis numa estação pública de aluguer. Embora seja possível modelar estes valores como contínuos e aplicar regressão tradicional, esta abordagem altera a distribuição subjacente dos rótulos de discreta para contínua. As distribuições discretas apresentam certos benefícios, o que nos leva a questionar se tais rótulos inteiros podem ser modelados diretamente por uma distribuição discreta, cujos parâmetros são previstos a partir das características de uma determinada instância. Além disso, focamo-nos no caso de uso de distribuições de saída de redes neuronais, o que acrescenta o requisito de que os parâmetros da distribuição sejam contínuos, para que a retropropagação e o gradiente descendente possam ser utilizados para aprender os pesos da rede. Investigamos várias opções para tais distribuições, algumas existentes e outras novas, e testamo-las numa variedade de tarefas, incluindo aprendizagem em dados tabulares, previsão sequencial e geração de imagens. Concluímos que, no geral, o melhor desempenho provém de duas distribuições: Bitwise, que representa o inteiro alvo em bits e aplica uma distribuição de Bernoulli a cada um, e um análogo discreto da distribuição de Laplace, que utiliza uma distribuição com caudas de decaimento exponencial em torno de uma média contínua.

English

We study the problem of predicting numeric labels that are constrained to the integers or to a subrange of the integers. For example, the number of up-votes on social media posts, or the number of bicycles available at a public rental station. While it is possible to model these as continuous values, and to apply traditional regression, this approach changes the underlying distribution on the labels from discrete to continuous. Discrete distributions have certain benefits, which leads us to the question whether such integer labels can be modeled directly by a discrete distribution, whose parameters are predicted from the features of a given instance. Moreover, we focus on the use case of output distributions of neural networks, which adds the requirement that the parameters of the distribution be continuous so that backpropagation and gradient descent may be used to learn the weights of the network. We investigate several options for such distributions, some existing and some novel, and test them on a range of tasks, including tabular learning, sequential prediction and image generation. We find that overall the best performance comes from two distributions: Bitwise, which represents the target integer in bits and places a Bernoulli distribution on each, and a discrete analogue of the Laplace distribution, which uses a distribution with exponentially decaying tails around a continuous mean.

Prevendo números inteiros a partir de parâmetros contínuos

Predicting integers from continuous parameters

Resumo

Support