通过分数最大化实现连续视觉自回归生成

摘要

传统观点认为，自回归模型主要用于处理离散数据。当应用于视觉数据等连续模态时，视觉自回归建模（VAR）通常采用基于量化的方法将数据转换到离散空间，这可能导致显著的信息损失。为解决这一问题，我们提出了一种连续VAR框架，该框架能够实现无需向量量化的直接视觉自回归生成。其理论基础是严格适当评分规则，这些规则提供了强大的统计工具，能够评估生成模型对真实分布的逼近程度。在此框架下，我们只需选择一个严格适当评分并将其设为训练目标进行优化。我们主要探索了一类基于能量分数的训练目标，该目标无需似然计算，从而克服了在连续空间中进行概率预测的难题。先前关于连续自回归生成的研究，如GIVT和扩散损失，也可以通过使用其他严格适当评分从我们的框架中推导出来。源代码：https://github.com/shaochenze/EAR。

English

Conventional wisdom suggests that autoregressive models are used to process discrete data. When applied to continuous modalities such as visual data, Visual AutoRegressive modeling (VAR) typically resorts to quantization-based approaches to cast the data into a discrete space, which can introduce significant information loss. To tackle this issue, we introduce a Continuous VAR framework that enables direct visual autoregressive generation without vector quantization. The underlying theoretical foundation is strictly proper scoring rules, which provide powerful statistical tools capable of evaluating how well a generative model approximates the true distribution. Within this framework, all we need is to select a strictly proper score and set it as the training objective to optimize. We primarily explore a class of training objectives based on the energy score, which is likelihood-free and thus overcomes the difficulty of making probabilistic predictions in the continuous space. Previous efforts on continuous autoregressive generation, such as GIVT and diffusion loss, can also be derived from our framework using other strictly proper scores. Source code: https://github.com/shaochenze/EAR.

通过分数最大化实现连续视觉自回归生成

Continuous Visual Autoregressive Generation via Score Maximization

摘要

Support