VolumeDiffusion：具有高效體積編碼器的靈活文本生成3D模型

摘要

本文介紹了一種為文本生成3D而設計的開創性3D體積編碼器。為了擴大擴散模型的訓練數據，開發了一個輕量級網絡，可以高效地從多視角圖像中獲取特徵體積。然後，使用3D U-Net對3D體積進行訓練，用於文本生成3D。這項研究進一步解決了不準確的物體標題和高維特徵體積的挑戰。所提出的模型在公共Objaverse數據集上訓練，展示了從文本提示生成多樣且可識別樣本的有希望結果。值得注意的是，它通過文本提示賦予對象部分特徵更精細的控制，通過無縫結合單個對象內的多個概念來促進模型創造力。這項研究通過引入一種高效、靈活且可擴展的表示方法，顯著促進了3D生成的進展。代碼可在https://github.com/tzco/VolumeDiffusion找到。

English

This paper introduces a pioneering 3D volumetric encoder designed for text-to-3D generation. To scale up the training data for the diffusion model, a lightweight network is developed to efficiently acquire feature volumes from multi-view images. The 3D volumes are then trained on a diffusion model for text-to-3D generation using a 3D U-Net. This research further addresses the challenges of inaccurate object captions and high-dimensional feature volumes. The proposed model, trained on the public Objaverse dataset, demonstrates promising outcomes in producing diverse and recognizable samples from text prompts. Notably, it empowers finer control over object part characteristics through textual cues, fostering model creativity by seamlessly combining multiple concepts within a single object. This research significantly contributes to the progress of 3D generation by introducing an efficient, flexible, and scalable representation methodology. Code is available at https://github.com/tzco/VolumeDiffusion.

VolumeDiffusion：具有高效體積編碼器的靈活文本生成3D模型

VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder

摘要

Support