Vorhersage von Ganzzahlen aus kontinuierlichen Parametern

Zusammenfassung

Wir untersuchen das Problem der Vorhersage numerischer Labels, die auf Ganzzahlen oder einen Teilbereich der Ganzzahlen beschränkt sind. Beispiele hierfür sind die Anzahl der positiven Bewertungen von Social-Media-Beiträgen oder die Anzahl der verfügbaren Fahrräder an einer öffentlichen Leihstation. Obwohl es möglich ist, diese als kontinuierliche Werte zu modellieren und traditionelle Regression anzuwenden, verändert dieser Ansatz die zugrundeliegende Verteilung der Labels von diskret zu kontinuierlich. Diskrete Verteilungen bieten bestimmte Vorteile, was uns zu der Frage führt, ob solche ganzzahligen Labels direkt durch eine diskrete Verteilung modelliert werden können, deren Parameter aus den Merkmalen einer gegebenen Instanz vorhergesagt werden. Darüber hinaus konzentrieren wir uns auf den Anwendungsfall von Ausgabeverteilungen in neuronalen Netzen, was die Anforderung hinzufügt, dass die Parameter der Verteilung kontinuierlich sein müssen, damit Backpropagation und Gradientenabstieg zur Gewichtung des Netzes verwendet werden können. Wir untersuchen mehrere Optionen für solche Verteilungen, sowohl bestehende als auch neuartige, und testen sie anhand einer Reihe von Aufgaben, einschließlich tabellarischen Lernens, sequentieller Vorhersage und Bildgenerierung. Wir stellen fest, dass insgesamt die beste Leistung von zwei Verteilungen erzielt wird: Bitweise, welche die Ziel-Ganzzahl in Bits darstellt und auf jedes eine Bernoulli-Verteilung anwendet, und einer diskreten Analogie der Laplace-Verteilung, die eine Verteilung mit exponentiell abklingenden Rändern um einen kontinuierlichen Mittelwert verwendet.

English

We study the problem of predicting numeric labels that are constrained to the integers or to a subrange of the integers. For example, the number of up-votes on social media posts, or the number of bicycles available at a public rental station. While it is possible to model these as continuous values, and to apply traditional regression, this approach changes the underlying distribution on the labels from discrete to continuous. Discrete distributions have certain benefits, which leads us to the question whether such integer labels can be modeled directly by a discrete distribution, whose parameters are predicted from the features of a given instance. Moreover, we focus on the use case of output distributions of neural networks, which adds the requirement that the parameters of the distribution be continuous so that backpropagation and gradient descent may be used to learn the weights of the network. We investigate several options for such distributions, some existing and some novel, and test them on a range of tasks, including tabular learning, sequential prediction and image generation. We find that overall the best performance comes from two distributions: Bitwise, which represents the target integer in bits and places a Bernoulli distribution on each, and a discrete analogue of the Laplace distribution, which uses a distribution with exponentially decaying tails around a continuous mean.

Vorhersage von Ganzzahlen aus kontinuierlichen Parametern

Predicting integers from continuous parameters

Zusammenfassung

Support