Over Berekeningslimieten en Aantoonbaar Efficiënte Criteria van Visuele Autoregressieve Modellen: Een Gedetailleerde Complexiteitsanalyse

Samenvatting

Onlangs introduceerden Visual Autoregressive (VAR) Modellen een baanbrekende vooruitgang in het veld van beeldgeneratie, waarbij een schaalbare benadering wordt geboden via een grof-naar-fijn "volgende-schaal voorspelling" paradigma. Echter, het state-of-the-art algoritme van VAR modellen in [Tian, Jiang, Yuan, Peng en Wang, NeurIPS 2024] vereist O(n^4) tijd, wat computationeel inefficiënt is. In dit werk analyseren we de computationele grenzen en efficiëntiecriteria van VAR Modellen door een fijnmazige complexiteitslens. Onze belangrijkste bijdrage is het identificeren van de voorwaarden waaronder VAR berekeningen sub-kwadratische tijdscomplexiteit kunnen bereiken. Specifiek stellen we een kritische drempel vast voor de norm van invoermatrices die worden gebruikt in VAR aandachtsmechanismen. Boven deze drempel, ervan uitgaande dat de Sterke Exponentiële Tijd Hypothese (SETH) uit de fijnmazige complexiteitstheorie geldt, is een sub-kwartische tijdsalgoritme voor VAR modellen onmogelijk. Om onze theoretische bevindingen te staven, presenteren we efficiënte constructies die gebruikmaken van lage-rang benaderingen die overeenkomen met de afgeleide criteria. Dit werk initieert de studie van de computationele efficiëntie van het VAR model vanuit een theoretisch perspectief. Onze techniek zal inzicht bieden in het bevorderen van schaalbare en efficiënte beeldgeneratie in VAR kaders.

English

Recently, Visual Autoregressive (VAR) Models introduced a groundbreaking advancement in the field of image generation, offering a scalable approach through a coarse-to-fine "next-scale prediction" paradigm. However, the state-of-the-art algorithm of VAR models in [Tian, Jiang, Yuan, Peng and Wang, NeurIPS 2024] takes O(n^4) time, which is computationally inefficient. In this work, we analyze the computational limits and efficiency criteria of VAR Models through a fine-grained complexity lens. Our key contribution is identifying the conditions under which VAR computations can achieve sub-quadratic time complexity. Specifically, we establish a critical threshold for the norm of input matrices used in VAR attention mechanisms. Above this threshold, assuming the Strong Exponential Time Hypothesis (SETH) from fine-grained complexity theory, a sub-quartic time algorithm for VAR models is impossible. To substantiate our theoretical findings, we present efficient constructions leveraging low-rank approximations that align with the derived criteria. This work initiates the study of the computational efficiency of the VAR model from a theoretical perspective. Our technique will shed light on advancing scalable and efficient image generation in VAR frameworks.

Over Berekeningslimieten en Aantoonbaar Efficiënte Criteria van Visuele Autoregressieve Modellen: Een Gedetailleerde Complexiteitsanalyse

On Computational Limits and Provably Efficient Criteria of Visual Autoregressive Models: A Fine-Grained Complexity Analysis

Samenvatting

Summary

Support

Support