通過分數最大化實現連續視覺自回歸生成

摘要

傳統觀點認為，自迴歸模型主要用於處理離散數據。當應用於視覺數據等連續模態時，視覺自迴歸建模（VAR）通常採用基於量化的方法將數據轉換到離散空間，這可能導致顯著的信息損失。為解決這一問題，我們引入了一種連續VAR框架，該框架能夠實現無需向量量化的直接視覺自迴歸生成。其理論基礎是嚴格適宜評分規則，這些規則提供了強大的統計工具，能夠評估生成模型對真實分佈的逼近程度。在此框架內，我們只需選擇一個嚴格適宜評分並將其設為訓練目標進行優化。我們主要探索了一類基於能量分的訓練目標，該目標無需似然函數，從而克服了在連續空間中進行概率預測的困難。先前關於連續自迴歸生成的研究，如GIVT和擴散損失，也可以通過使用其他嚴格適宜評分從我們的框架中推導出來。源代碼：https://github.com/shaochenze/EAR。

English

Conventional wisdom suggests that autoregressive models are used to process discrete data. When applied to continuous modalities such as visual data, Visual AutoRegressive modeling (VAR) typically resorts to quantization-based approaches to cast the data into a discrete space, which can introduce significant information loss. To tackle this issue, we introduce a Continuous VAR framework that enables direct visual autoregressive generation without vector quantization. The underlying theoretical foundation is strictly proper scoring rules, which provide powerful statistical tools capable of evaluating how well a generative model approximates the true distribution. Within this framework, all we need is to select a strictly proper score and set it as the training objective to optimize. We primarily explore a class of training objectives based on the energy score, which is likelihood-free and thus overcomes the difficulty of making probabilistic predictions in the continuous space. Previous efforts on continuous autoregressive generation, such as GIVT and diffusion loss, can also be derived from our framework using other strictly proper scores. Source code: https://github.com/shaochenze/EAR.

通過分數最大化實現連續視覺自回歸生成

Continuous Visual Autoregressive Generation via Score Maximization

摘要

Support