CLAY:一個可控制的大規模生成模型,用於創建高質量的3D資產
CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets
May 30, 2024
作者: Longwen Zhang, Ziyu Wang, Qixuan Zhang, Qiwei Qiu, Anqi Pang, Haoran Jiang, Wei Yang, Lan Xu, Jingyi Yu
cs.AI
摘要
在數位創意領域中,我們往往受到現有數位工具的限制,這些工具要求豐富的專業知識和努力,使我們難以從想像中精細地打造立體的3D世界。為了縮小這種差距,我們引入了CLAY,一個3D幾何和材質生成器,旨在輕鬆將人類想像轉化為精細的3D數位結構。CLAY支援經典文本或圖像輸入,以及來自不同基本元素(多視圖圖像、體素、邊界框、點雲、隱式表示等)的3D感知控制。其核心是由多分辨率變分自編碼器(VAE)和極簡潛在擴散變壓器(DiT)組成的大規模生成模型,可直接從多樣的3D幾何中提取豐富的3D先驗知識。具體而言,它採用神經場來表示連續完整的表面,並在潛在空間中使用純變壓器塊的幾何生成模組。我們提出了一種漸進式訓練方案,通過精心設計的處理流程獲取超大型3D模型數據集來訓練CLAY,從而產生具有15億參數的3D本地幾何生成器。對於外觀生成,CLAY旨在通過採用多視圖材質擴散模型來生成基於物理的渲染(PBR)紋理,可生成具有漫反射、粗糙度和金屬性的2K分辨率紋理。我們展示了使用CLAY進行一系列可控的3D資產創作,從草圖概念設計到具有精細細節的生產就緒資產。即使是首次使用者也可以輕鬆使用CLAY將他們生動的3D想像變為現實,釋放無限創造力。
English
In the realm of digital creativity, our potential to craft intricate 3D
worlds from imagination is often hampered by the limitations of existing
digital tools, which demand extensive expertise and efforts. To narrow this
disparity, we introduce CLAY, a 3D geometry and material generator designed to
effortlessly transform human imagination into intricate 3D digital structures.
CLAY supports classic text or image inputs as well as 3D-aware controls from
diverse primitives (multi-view images, voxels, bounding boxes, point clouds,
implicit representations, etc). At its core is a large-scale generative model
composed of a multi-resolution Variational Autoencoder (VAE) and a minimalistic
latent Diffusion Transformer (DiT), to extract rich 3D priors directly from a
diverse range of 3D geometries. Specifically, it adopts neural fields to
represent continuous and complete surfaces and uses a geometry generative
module with pure transformer blocks in latent space. We present a progressive
training scheme to train CLAY on an ultra large 3D model dataset obtained
through a carefully designed processing pipeline, resulting in a 3D native
geometry generator with 1.5 billion parameters. For appearance generation, CLAY
sets out to produce physically-based rendering (PBR) textures by employing a
multi-view material diffusion model that can generate 2K resolution textures
with diffuse, roughness, and metallic modalities. We demonstrate using CLAY for
a range of controllable 3D asset creations, from sketchy conceptual designs to
production ready assets with intricate details. Even first time users can
easily use CLAY to bring their vivid 3D imaginations to life, unleashing
unlimited creativity.Summary
AI-Generated Summary