iFSQ:以一行程式碼改進FSQ在影像生成中的表現
iFSQ: Improving FSQ for Image Generation with 1 Line of Code
January 23, 2026
作者: Bin Lin, Zongjian Li, Yuwei Niu, Kaixiong Gong, Yunyang Ge, Yunlong Lin, Mingzhe Zheng, JianWei Zhang, Miles Yang, Zhao Zhong, Liefeng Bo, Li Yuan
cs.AI
摘要
圖像生成領域目前存在兩大分支:基於離散標記的自回歸模型與利用連續潛變數的擴散模型。這種源於VQ-VAE與VAE理論分野的對立局面,阻礙了統一建模框架的建立與公平基準測試的開展。有限標量量化(FSQ)雖能構建理論橋樑,但其原始版本存在關鍵缺陷:等間距量化機制可能引發激活值崩塌。這種不匹配性迫使模型在重建保真度與信息效率之間進行權衡。本研究通過將原始FSQ中的激活函數替換為分佈匹配映射以強制均勻先驗,成功破解了這一難題。該簡潔策略被命名為iFSQ,僅需單行代碼即可數學保證最優的量化區間利用率與重建精度。基於iFSQ構建受控基準測試後,我們獲得兩項關鍵發現:(1)離散與連續表徵的最優平衡點約為每維度4比特;(2)在相同重建約束下,自回歸模型呈現快速初始收斂特性,而擴散模型則能達到更優的性能上限,這表明嚴格的序列約束可能限制生成質量的理論極值。最後,我們通過將表徵對齊(REPA)技術適配至自回歸模型拓展分析,構建出LlamaGen-REPA。程式碼已開源於:https://github.com/Tencent-Hunyuan/iFSQ
English
The field of image generation is currently bifurcated into autoregressive (AR) models operating on discrete tokens and diffusion models utilizing continuous latents. This divide, rooted in the distinction between VQ-VAEs and VAEs, hinders unified modeling and fair benchmarking. Finite Scalar Quantization (FSQ) offers a theoretical bridge, yet vanilla FSQ suffers from a critical flaw: its equal-interval quantization can cause activation collapse. This mismatch forces a trade-off between reconstruction fidelity and information efficiency. In this work, we resolve this dilemma by simply replacing the activation function in original FSQ with a distribution-matching mapping to enforce a uniform prior. Termed iFSQ, this simple strategy requires just one line of code yet mathematically guarantees both optimal bin utilization and reconstruction precision. Leveraging iFSQ as a controlled benchmark, we uncover two key insights: (1) The optimal equilibrium between discrete and continuous representations lies at approximately 4 bits per dimension. (2) Under identical reconstruction constraints, AR models exhibit rapid initial convergence, whereas diffusion models achieve a superior performance ceiling, suggesting that strict sequential ordering may limit the upper bounds of generation quality. Finally, we extend our analysis by adapting Representation Alignment (REPA) to AR models, yielding LlamaGen-REPA. Codes is available at https://github.com/Tencent-Hunyuan/iFSQ