Open-MAGVIT2:一個旨在實現視覺自回歸生成民主化的開源項目。
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation
September 6, 2024
作者: Zhuoyan Luo, Fengyuan Shi, Yixiao Ge, Yujiu Yang, Limin Wang, Ying Shan
cs.AI
摘要
我們介紹了 Open-MAGVIT2,這是一系列從 3 億到 15 億參數的自回歸圖像生成模型。Open-MAGVIT2 項目提供了 Google 的 MAGVIT-v2 分詞器的開源版本,這是一個具有超大型碼書(即 2^{18} 個代碼)的分詞器,並在 ImageNet 256x256 上實現了最先進的重建性能(1.17 rFID)。此外,我們探索了其在普通自回歸模型中的應用,並驗證了可擴展性特性。為了幫助自回歸模型預測超大詞彙,我們通過非對稱標記因子化將其分解為兩個不同大小的子詞彙,並進一步引入了“下一個子詞元預測”以增強子詞元交互以獲得更好的生成質量。我們釋放了所有模型和代碼,以促進自回歸視覺生成領域的創新和創造力。
English
We present Open-MAGVIT2, a family of auto-regressive image generation models
ranging from 300M to 1.5B. The Open-MAGVIT2 project produces an open-source
replication of Google's MAGVIT-v2 tokenizer, a tokenizer with a super-large
codebook (i.e., 2^{18} codes), and achieves the state-of-the-art
reconstruction performance (1.17 rFID) on ImageNet 256 times 256.
Furthermore, we explore its application in plain auto-regressive models and
validate scalability properties. To assist auto-regressive models in predicting
with a super-large vocabulary, we factorize it into two sub-vocabulary of
different sizes by asymmetric token factorization, and further introduce "next
sub-token prediction" to enhance sub-token interaction for better generation
quality. We release all models and codes to foster innovation and creativity in
the field of auto-regressive visual generation.Summary
AI-Generated Summary