SNCE：面向可扩展离散图像生成的几何感知监督方法

摘要

近期离散图像生成领域的研究进展表明，扩大VQ码本规模可显著提升重建保真度。然而，采用大型VQ码本训练生成模型仍存在挑战，通常需要更大的模型规模与更长的训练周期。本研究提出随机邻域交叉熵最小化（SNCE），这是一种针对大型码本离散图像生成器优化难题设计的新型训练目标。与传统硬性独热标注监督不同，SNCE通过构建邻域标记集合上的柔性分类分布进行训练，每个标记的概率分配与其编码嵌入和真实图像嵌入的邻近度成正比，从而促使模型在量化嵌入空间中捕捉具有语义意义的几何结构。我们在类别条件ImageNet-256生成、大规模文本到图像合成及图像编辑任务上开展了广泛实验。结果表明，相较于标准交叉熵目标，SNCE能显著提升收敛速度与整体生成质量。

English

Recent advancements in discrete image generation showed that scaling the VQ codebook size significantly improves reconstruction fidelity. However, training generative models with a large VQ codebook remains challenging, typically requiring larger model size and a longer training schedule. In this work, we propose Stochastic Neighbor Cross Entropy Minimization (SNCE), a novel training objective designed to address the optimization challenges of large-codebook discrete image generators. Instead of supervising the model with a hard one-hot target, SNCE constructs a soft categorical distribution over a set of neighboring tokens. The probability assigned to each token is proportional to the proximity between its code embedding and the ground-truth image embedding, encouraging the model to capture semantically meaningful geometric structure in the quantized embedding space. We conduct extensive experiments across class-conditional ImageNet-256 generation, large-scale text-to-image synthesis, and image editing tasks. Results show that SNCE significantly improves convergence speed and overall generation quality compared to standard cross-entropy objectives.

SNCE：面向可扩展离散图像生成的几何感知监督方法

SNCE: Geometry-Aware Supervision for Scalable Discrete Image Generation

摘要

Support