Open-MAGVIT2:一个旨在民主化自回归视觉生成的开源项目
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation
September 6, 2024
作者: Zhuoyan Luo, Fengyuan Shi, Yixiao Ge, Yujiu Yang, Limin Wang, Ying Shan
cs.AI
摘要
我们介绍了Open-MAGVIT2,这是一个范围从3亿到15亿的自回归图像生成模型系列。Open-MAGVIT2项目提供了Google的MAGVIT-v2分词器的开源复制品,这是一个具有超大码书(即2^{18}个码字)的分词器,并在ImageNet 256乘256上实现了最先进的重建性能(1.17的rFID)。此外,我们探讨了其在普通自回归模型中的应用,并验证了可扩展性特性。为了帮助自回归模型预测具有超大词汇量,我们通过不对称的标记因式分解将其分解为两个不同大小的子词汇,并进一步引入“下一个子标记预测”来增强子标记交互以获得更好的生成质量。我们发布所有模型和代码,以促进自回归视觉生成领域的创新和创造力。
English
We present Open-MAGVIT2, a family of auto-regressive image generation models
ranging from 300M to 1.5B. The Open-MAGVIT2 project produces an open-source
replication of Google's MAGVIT-v2 tokenizer, a tokenizer with a super-large
codebook (i.e., 2^{18} codes), and achieves the state-of-the-art
reconstruction performance (1.17 rFID) on ImageNet 256 times 256.
Furthermore, we explore its application in plain auto-regressive models and
validate scalability properties. To assist auto-regressive models in predicting
with a super-large vocabulary, we factorize it into two sub-vocabulary of
different sizes by asymmetric token factorization, and further introduce "next
sub-token prediction" to enhance sub-token interaction for better generation
quality. We release all models and codes to foster innovation and creativity in
the field of auto-regressive visual generation.Summary
AI-Generated Summary