ChatPaper.aiChatPaper

球面水蛭量化:视觉标记化与生成的创新方法

Spherical Leech Quantization for Visual Tokenization and Generation

December 16, 2025
作者: Yue Zhao, Hanwen Jiang, Zhenlin Xu, Chutong Yang, Ehsan Adeli, Philipp Krähenbühl
cs.AI

摘要

非参数量化方法因其参数高效性和对大码本的良好扩展性而备受关注。本文通过格编码理论框架,提出了不同非参数量化方法的统一表述。格码的几何特性揭示了在训练自编码器时,为何需要为BSQ等现有无查表量化变体引入辅助损失项。在此基础上,我们探索了包括随机格、广义斐波那契格以及最密球堆积格在内的多种候选方案。研究发现,基于Leech格的量化方法(命名为球面Leech量化Λ_{24}-SQ)凭借其高对称性和超球面上的均匀分布特性,既能简化训练流程,又能改善重建-压缩的权衡关系。在图像标记化与压缩任务中,该方法在所有评估指标上均优于当前最优技术BSQ,同时略微降低比特消耗。这种改进优势同样体现在最先进的自回归图像生成框架中。
English
Non-parametric quantization has received much attention due to its efficiency on parameters and scalability to a large codebook. In this paper, we present a unified formulation of different non-parametric quantization methods through the lens of lattice coding. The geometry of lattice codes explains the necessity of auxiliary loss terms when training auto-encoders with certain existing lookup-free quantization variants such as BSQ. As a step forward, we explore a few possible candidates, including random lattices, generalized Fibonacci lattices, and densest sphere packing lattices. Among all, we find the Leech lattice-based quantization method, which is dubbed as Spherical Leech Quantization (Λ_{24}-SQ), leads to both a simplified training recipe and an improved reconstruction-compression tradeoff thanks to its high symmetry and even distribution on the hypersphere. In image tokenization and compression tasks, this quantization approach achieves better reconstruction quality across all metrics than BSQ, the best prior art, while consuming slightly fewer bits. The improvement also extends to state-of-the-art auto-regressive image generation frameworks.
PDF62December 18, 2025