Continue Visuele Autoregressieve Generatie via Scoremaximalisatie

Samenvatting

De conventionele wijsheid suggereert dat autoregressieve modellen worden gebruikt om discrete gegevens te verwerken. Wanneer ze worden toegepast op continue modaliteiten zoals visuele gegevens, neigt Visual AutoRegressive modeling (VAR) meestal naar op kwantisatie gebaseerde benaderingen om de gegevens in een discrete ruimte om te zetten, wat aanzienlijk informatieverlies kan veroorzaken. Om dit probleem aan te pakken, introduceren we een Continuous VAR-framework dat directe visuele autoregressieve generatie mogelijk maakt zonder vectorkwantisatie. De onderliggende theoretische basis bestaat uit strikt juiste scoringsregels, die krachtige statistische tools bieden die kunnen evalueren hoe goed een generatief model de ware verdeling benadert. Binnen dit framework hoeven we alleen maar een strikt juiste score te selecteren en deze als trainingsdoelstelling in te stellen om te optimaliseren. We onderzoeken voornamelijk een klasse van trainingsdoelstellingen gebaseerd op de energyscore, die likelihood-vrij is en daardoor de moeilijkheid overwint om probabilistische voorspellingen te doen in de continue ruimte. Eerdere inspanningen op het gebied van continue autoregressieve generatie, zoals GIVT en diffusieverlies, kunnen ook worden afgeleid uit ons framework met behulp van andere strikt juiste scores. Broncode: https://github.com/shaochenze/EAR.

English

Conventional wisdom suggests that autoregressive models are used to process discrete data. When applied to continuous modalities such as visual data, Visual AutoRegressive modeling (VAR) typically resorts to quantization-based approaches to cast the data into a discrete space, which can introduce significant information loss. To tackle this issue, we introduce a Continuous VAR framework that enables direct visual autoregressive generation without vector quantization. The underlying theoretical foundation is strictly proper scoring rules, which provide powerful statistical tools capable of evaluating how well a generative model approximates the true distribution. Within this framework, all we need is to select a strictly proper score and set it as the training objective to optimize. We primarily explore a class of training objectives based on the energy score, which is likelihood-free and thus overcomes the difficulty of making probabilistic predictions in the continuous space. Previous efforts on continuous autoregressive generation, such as GIVT and diffusion loss, can also be derived from our framework using other strictly proper scores. Source code: https://github.com/shaochenze/EAR.

Continue Visuele Autoregressieve Generatie via Scoremaximalisatie

Continuous Visual Autoregressive Generation via Score Maximization

Samenvatting

Summary

Support

Support