ChatPaper.aiChatPaper

Compress3D:从单个图像生成3D的压缩潜空间

Compress3D: a Compressed Latent Space for 3D Generation from a Single Image

March 20, 2024
作者: Bowen Zhang, Tianyu Yang, Yu Li, Lei Zhang, Xi Zhao
cs.AI

摘要

3D生成已经取得了显著的进展,但是从单个图像高效地生成高质量的3D资产仍然具有挑战性。在本文中,我们提出了一种三平面自动编码器,将3D模型编码成紧凑的三平面潜空间,以有效地压缩3D几何和纹理信息。在自动编码器框架内,我们引入了一种3D感知交叉注意力机制,利用低分辨率潜空间表示来查询来自高分辨率3D特征体积的特征,从而增强潜空间的表示能力。随后,我们在这个经过精细调整的潜空间上训练扩散模型。与仅依赖图像嵌入进行3D生成相比,我们提出的方法主张同时利用图像嵌入和形状嵌入作为条件。具体而言,形状嵌入是通过一个以图像嵌入为条件的扩散先验模型估计的。通过全面的实验,我们展示了我们的方法优于最先进的算法,在需要更少的训练数据和时间的情况下实现了卓越的性能。我们的方法能够在单个A100 GPU上仅需7秒的时间生成高质量的3D资产。
English
3D generation has witnessed significant advancements, yet efficiently producing high-quality 3D assets from a single image remains challenging. In this paper, we present a triplane autoencoder, which encodes 3D models into a compact triplane latent space to effectively compress both the 3D geometry and texture information. Within the autoencoder framework, we introduce a 3D-aware cross-attention mechanism, which utilizes low-resolution latent representations to query features from a high-resolution 3D feature volume, thereby enhancing the representation capacity of the latent space. Subsequently, we train a diffusion model on this refined latent space. In contrast to solely relying on image embedding for 3D generation, our proposed method advocates for the simultaneous utilization of both image embedding and shape embedding as conditions. Specifically, the shape embedding is estimated via a diffusion prior model conditioned on the image embedding. Through comprehensive experiments, we demonstrate that our method outperforms state-of-the-art algorithms, achieving superior performance while requiring less training data and time. Our approach enables the generation of high-quality 3D assets in merely 7 seconds on a single A100 GPU.

Summary

AI-Generated Summary

PDF82December 15, 2024