자기회귀적 시각적 생성을 위한 연속적 토큰과 이산적 토큰의 연결

초록

자기회귀 시각 생성 모델은 일반적으로 이미지를 순차적으로 예측 가능한 토큰으로 압축하기 위해 토크나이저에 의존합니다. 토큰 표현에는 근본적인 딜레마가 존재합니다: 이산 토큰은 표준 교차 엔트로피 손실을 통해 직관적인 모델링을 가능하게 하지만, 정보 손실과 토크나이저 학습 불안정성을 겪습니다. 반면, 연속 토큰은 시각적 세부 사항을 더 잘 보존하지만, 복잡한 분포 모델링이 필요하여 생성 파이프라인을 복잡하게 만듭니다. 본 논문에서는 이러한 간극을 메우기 위해 연속 토큰의 강력한 표현 능력을 유지하면서도 이산 토큰의 단순한 모델링을 보존하는 TokenBridge를 제안합니다. 이를 위해, 우리는 토크나이저 학습 과정에서 이산화를 분리하여 연속 표현에서 직접 이산 토큰을 얻는 사후 학습 양자화를 도입합니다. 구체적으로, 각 특징 차원을 독립적으로 이산화하는 차원별 양자화 전략과, 결과적으로 큰 토큰 공간을 효율적으로 모델링하는 경량 자기회귀 예측 메커니즘을 결합합니다. 광범위한 실험을 통해 우리의 접근 방식이 연속 방법과 동등한 재구성 및 생성 품질을 달성하면서도 표준 범주형 예측을 사용함을 보여줍니다. 이 연구는 이산과 연속 패러다임을 연결함으로써 두 접근 방식의 장점을 효과적으로 활용할 수 있음을 입증하며, 단순한 자기회귀 모델링을 통해 고품질 시각 생성을 위한 유망한 방향을 제시합니다. 프로젝트 페이지: https://yuqingwang1029.github.io/TokenBridge.

English

Autoregressive visual generation models typically rely on tokenizers to compress images into tokens that can be predicted sequentially. A fundamental dilemma exists in token representation: discrete tokens enable straightforward modeling with standard cross-entropy loss, but suffer from information loss and tokenizer training instability; continuous tokens better preserve visual details, but require complex distribution modeling, complicating the generation pipeline. In this paper, we propose TokenBridge, which bridges this gap by maintaining the strong representation capacity of continuous tokens while preserving the modeling simplicity of discrete tokens. To achieve this, we decouple discretization from the tokenizer training process through post-training quantization that directly obtains discrete tokens from continuous representations. Specifically, we introduce a dimension-wise quantization strategy that independently discretizes each feature dimension, paired with a lightweight autoregressive prediction mechanism that efficiently model the resulting large token space. Extensive experiments show that our approach achieves reconstruction and generation quality on par with continuous methods while using standard categorical prediction. This work demonstrates that bridging discrete and continuous paradigms can effectively harness the strengths of both approaches, providing a promising direction for high-quality visual generation with simple autoregressive modeling. Project page: https://yuqingwang1029.github.io/TokenBridge.

자기회귀적 시각적 생성을 위한 연속적 토큰과 이산적 토큰의 연결

Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation

초록

Support