ChatPaper.aiChatPaper

CLAY:一个可控的用于创建高质量3D资产的大规模生成模型

CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets

May 30, 2024
作者: Longwen Zhang, Ziyu Wang, Qixuan Zhang, Qiwei Qiu, Anqi Pang, Haoran Jiang, Wei Yang, Lan Xu, Jingyi Yu
cs.AI

摘要

在数字创意领域,我们从想象中打造复杂的3D世界的潜力常常受到现有数字工具的限制,这些工具需要广泛的专业知识和努力。为了缩小这种差距,我们引入了CLAY,一个3D几何和材质生成器,旨在轻松将人类想象转化为复杂的3D数字结构。CLAY支持经典文本或图像输入,以及来自各种基元(多视图图像、体素、边界框、点云、隐式表示等)的3D感知控制。其核心是一个大规模生成模型,由多分辨率变分自动编码器(VAE)和一个简约的潜在扩散变换器(DiT)组成,可直接从各种3D几何中提取丰富的3D先验。具体而言,它采用神经场来表示连续完整的表面,并在潜在空间中使用具有纯变换器块的几何生成模块。我们提出了一个渐进式训练方案,通过精心设计的处理流程获得一个超大规模的3D模型数据集来训练CLAY,从而得到一个拥有15亿参数的3D本地几何生成器。对于外观生成,CLAY旨在通过采用多视图材质扩散模型生成基于物理的渲染(PBR)纹理,可以生成包含漫反射、粗糙度和金属度模式的2K分辨率纹理。我们展示了如何使用CLAY进行一系列可控的3D资产创建,从草图概念设计到具有复杂细节的生产就绪资产。即使是首次用户也可以轻松使用CLAY将他们生动的3D想象变为现实,释放无限创造力。
English
In the realm of digital creativity, our potential to craft intricate 3D worlds from imagination is often hampered by the limitations of existing digital tools, which demand extensive expertise and efforts. To narrow this disparity, we introduce CLAY, a 3D geometry and material generator designed to effortlessly transform human imagination into intricate 3D digital structures. CLAY supports classic text or image inputs as well as 3D-aware controls from diverse primitives (multi-view images, voxels, bounding boxes, point clouds, implicit representations, etc). At its core is a large-scale generative model composed of a multi-resolution Variational Autoencoder (VAE) and a minimalistic latent Diffusion Transformer (DiT), to extract rich 3D priors directly from a diverse range of 3D geometries. Specifically, it adopts neural fields to represent continuous and complete surfaces and uses a geometry generative module with pure transformer blocks in latent space. We present a progressive training scheme to train CLAY on an ultra large 3D model dataset obtained through a carefully designed processing pipeline, resulting in a 3D native geometry generator with 1.5 billion parameters. For appearance generation, CLAY sets out to produce physically-based rendering (PBR) textures by employing a multi-view material diffusion model that can generate 2K resolution textures with diffuse, roughness, and metallic modalities. We demonstrate using CLAY for a range of controllable 3D asset creations, from sketchy conceptual designs to production ready assets with intricate details. Even first time users can easily use CLAY to bring their vivid 3D imaginations to life, unleashing unlimited creativity.

Summary

AI-Generated Summary

PDF122November 28, 2024