Compress3D:從單張圖像生成3D的壓縮潛在空間
Compress3D: a Compressed Latent Space for 3D Generation from a Single Image
March 20, 2024
作者: Bowen Zhang, Tianyu Yang, Yu Li, Lei Zhang, Xi Zhao
cs.AI
摘要
3D生成技術已經取得了顯著的進展,然而從單張圖像高效地生成高質量的3D資產仍然具有挑戰性。在本文中,我們提出了一種三面體自編碼器,將3D模型編碼為一個緊湊的三面體潛在空間,以有效地壓縮3D幾何和紋理信息。在自編碼器框架內,我們引入了一個3D感知交叉注意機制,該機制利用低分辨率的潛在表示來從高分辨率的3D特徵體積中查詢特徵,從而增強了潛在空間的表示能力。隨後,我們在這個精煉的潛在空間上訓練擴散模型。與僅依賴圖像嵌入進行3D生成相比,我們提出的方法主張同時利用圖像嵌入和形狀嵌入作為條件。具體來說,形狀嵌入是通過一個以圖像嵌入為條件的擴散先驗模型來估計的。通過全面的實驗,我們展示了我們的方法優於最先進的算法,在需要更少的訓練數據和時間的情況下實現了卓越的性能。我們的方法使得僅需7秒便能在單個A100 GPU上生成高質量的3D資產。
English
3D generation has witnessed significant advancements, yet efficiently
producing high-quality 3D assets from a single image remains challenging. In
this paper, we present a triplane autoencoder, which encodes 3D models into a
compact triplane latent space to effectively compress both the 3D geometry and
texture information. Within the autoencoder framework, we introduce a 3D-aware
cross-attention mechanism, which utilizes low-resolution latent representations
to query features from a high-resolution 3D feature volume, thereby enhancing
the representation capacity of the latent space. Subsequently, we train a
diffusion model on this refined latent space. In contrast to solely relying on
image embedding for 3D generation, our proposed method advocates for the
simultaneous utilization of both image embedding and shape embedding as
conditions. Specifically, the shape embedding is estimated via a diffusion
prior model conditioned on the image embedding. Through comprehensive
experiments, we demonstrate that our method outperforms state-of-the-art
algorithms, achieving superior performance while requiring less training data
and time. Our approach enables the generation of high-quality 3D assets in
merely 7 seconds on a single A100 GPU.Summary
AI-Generated Summary