Het voorspellen van gehele getallen op basis van continue parameters

Samenvatting

Wij bestuderen het probleem van het voorspellen van numerieke labels die beperkt zijn tot de gehele getallen of een subbereik daarvan. Voorbeelden hiervan zijn het aantal 'upvotes' op sociale media posts, of het aantal beschikbare fietsen bij een openbaar uitleenpunt. Hoewel het mogelijk is deze als continue waarden te modelleren en traditionele regressie toe te passen, verandert deze aanpak de onderliggende verdeling van de labels van discreet naar continu. Discrete verdelingen hebben bepaalde voordelen, wat ons brengt tot de vraag of dergelijke integer-labels rechtstreeks gemodelleerd kunnen worden door een discrete verdeling, waarvan de parameters worden voorspeld op basis van de kenmerken van een gegeven instantie. Bovendien richten we ons op de use case van uitvoerverdelingen van neurale netwerken, wat de eis toevoegt dat de parameters van de verdeling continu moeten zijn, zodat backpropagation en gradient descent kunnen worden gebruikt om de gewichten van het netwerk te leren. Wij onderzoeken verschillende opties voor dergelijke verdelingen, sommige bestaand en sommige nieuw, en testen deze op een reeks taken, waaronder tabulair leren, sequentiële voorspelling en beeldgeneratie. Wij concluderen dat over het algemeen de beste prestaties komen van twee verdelingen: Bitwise, die het doelgetal in bits representeert en op elke bit een Bernoulli-verdeling plaatst, en een discreet analogon van de Laplace-verdeling, die een verdeling gebruikt met exponentieel vervallende staarten rond een continu gemiddelde.

English

We study the problem of predicting numeric labels that are constrained to the integers or to a subrange of the integers. For example, the number of up-votes on social media posts, or the number of bicycles available at a public rental station. While it is possible to model these as continuous values, and to apply traditional regression, this approach changes the underlying distribution on the labels from discrete to continuous. Discrete distributions have certain benefits, which leads us to the question whether such integer labels can be modeled directly by a discrete distribution, whose parameters are predicted from the features of a given instance. Moreover, we focus on the use case of output distributions of neural networks, which adds the requirement that the parameters of the distribution be continuous so that backpropagation and gradient descent may be used to learn the weights of the network. We investigate several options for such distributions, some existing and some novel, and test them on a range of tasks, including tabular learning, sequential prediction and image generation. We find that overall the best performance comes from two distributions: Bitwise, which represents the target integer in bits and places a Bernoulli distribution on each, and a discrete analogue of the Laplace distribution, which uses a distribution with exponentially decaying tails around a continuous mean.

Het voorspellen van gehele getallen op basis van continue parameters

Predicting integers from continuous parameters

Samenvatting

Support