ChatPaper.aiChatPaper

BiGR:利用二进制潜在编码进行图像生成和改善视觉表示能力

BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities

October 18, 2024
作者: Shaozhe Hao, Xuantong Liu, Xianbiao Qi, Shihao Zhao, Bojia Zi, Rong Xiao, Kai Han, Kwan-Yee K. Wong
cs.AI

摘要

我们介绍了一种新颖的条件图像生成模型BiGR,该模型使用紧凑的二进制潜在代码进行生成式训练,旨在增强生成和表示能力。BiGR是第一个将生成和判别统一在同一框架内的条件生成模型。BiGR具有二进制标记器、掩码建模机制和用于二进制代码预测的二进制转码器。此外,我们引入了一种新颖的熵排序抽样方法,以实现高效的图像生成。大量实验证实了BiGR在生成质量(以FID-50k衡量)和表示能力(通过线性探测准确度证明)方面的卓越表现。此外,BiGR展示了在各种视觉任务中的零样本泛化能力,实现了图像修补、外延、编辑、插值和丰富化等应用,无需进行结构修改。我们的研究结果表明,BiGR有效地统一了生成和判别任务,为该领域的进一步发展铺平了道路。
English
We introduce BiGR, a novel conditional image generation model using compact binary latent codes for generative training, focusing on enhancing both generation and representation capabilities. BiGR is the first conditional generative model that unifies generation and discrimination within the same framework. BiGR features a binary tokenizer, a masked modeling mechanism, and a binary transcoder for binary code prediction. Additionally, we introduce a novel entropy-ordered sampling method to enable efficient image generation. Extensive experiments validate BiGR's superior performance in generation quality, as measured by FID-50k, and representation capabilities, as evidenced by linear-probe accuracy. Moreover, BiGR showcases zero-shot generalization across various vision tasks, enabling applications such as image inpainting, outpainting, editing, interpolation, and enrichment, without the need for structural modifications. Our findings suggest that BiGR unifies generative and discriminative tasks effectively, paving the way for further advancements in the field.

Summary

AI-Generated Summary

PDF82November 16, 2024