ChatPaper.aiChatPaper

iFSQ:仅用一行代码改进FSQ的图像生成能力

iFSQ: Improving FSQ for Image Generation with 1 Line of Code

January 23, 2026
作者: Bin Lin, Zongjian Li, Yuwei Niu, Kaixiong Gong, Yunyang Ge, Yunlong Lin, Mingzhe Zheng, JianWei Zhang, Miles Yang, Zhao Zhong, Liefeng Bo, Li Yuan
cs.AI

摘要

当前图像生成领域正分化为两大技术路径:基于离散标记的自回归模型与利用连续隐空间的扩散模型。这种由VQ-VAE与VAE技术路线差异引发的分野,阻碍了统一建模与公平基准评估。有限标量化方法虽在理论上搭建了桥梁,但原始FSQ存在根本缺陷:其等间隔量化机制易导致激活值塌缩,迫使模型在重建保真度与信息效率间进行权衡。本研究通过将原始FSQ中的激活函数替换为分布匹配映射以强制均匀先验,成功破解了这一困境。该策略被命名为iFSQ,仅需单行代码修改即可数学保证最优的量化箱利用率与重建精度。基于iFSQ构建受控实验基准,我们获得两项关键发现:(1)离散与连续表示的最优平衡点约为每维度4比特;(2)在相同重建约束下,自回归模型呈现快速初始收敛特性,而扩散模型则展现出更优的性能上限,表明严格序列排序可能制约生成质量的理论极限。最后,我们通过将表示对齐技术适配至自回归模型拓展研究边界,构建出LlamaGen-REPA模型。代码已开源:https://github.com/Tencent-Hunyuan/iFSQ
English
The field of image generation is currently bifurcated into autoregressive (AR) models operating on discrete tokens and diffusion models utilizing continuous latents. This divide, rooted in the distinction between VQ-VAEs and VAEs, hinders unified modeling and fair benchmarking. Finite Scalar Quantization (FSQ) offers a theoretical bridge, yet vanilla FSQ suffers from a critical flaw: its equal-interval quantization can cause activation collapse. This mismatch forces a trade-off between reconstruction fidelity and information efficiency. In this work, we resolve this dilemma by simply replacing the activation function in original FSQ with a distribution-matching mapping to enforce a uniform prior. Termed iFSQ, this simple strategy requires just one line of code yet mathematically guarantees both optimal bin utilization and reconstruction precision. Leveraging iFSQ as a controlled benchmark, we uncover two key insights: (1) The optimal equilibrium between discrete and continuous representations lies at approximately 4 bits per dimension. (2) Under identical reconstruction constraints, AR models exhibit rapid initial convergence, whereas diffusion models achieve a superior performance ceiling, suggesting that strict sequential ordering may limit the upper bounds of generation quality. Finally, we extend our analysis by adapting Representation Alignment (REPA) to AR models, yielding LlamaGen-REPA. Codes is available at https://github.com/Tencent-Hunyuan/iFSQ
PDF241January 28, 2026