VolumeDiffusion：具有高效体积编码器的灵活文本到3D生成

摘要

本文介绍了一种为文本生成3D图像而设计的开创性3D体积编码器。为了扩大扩散模型的训练数据，研发了一种轻量级网络，能够高效地从多视角图像中获取特征体积。然后，利用3D U-Net对这些3D体积进行训练，用于文本生成3D图像。该研究进一步解决了不准确的物体描述和高维特征体积所带来的挑战。所提出的模型在公开的Objaverse数据集上训练，展示了从文本提示生成多样且可识别样本的有希望结果。值得注意的是，它通过文本提示赋予了对物体部分特征的更精细控制，通过在单个物体内无缝结合多个概念来促进模型创造力。这项研究通过引入一种高效、灵活且可扩展的表示方法，对3D生成的进展做出了重大贡献。代码可在https://github.com/tzco/VolumeDiffusion找到。

English

This paper introduces a pioneering 3D volumetric encoder designed for text-to-3D generation. To scale up the training data for the diffusion model, a lightweight network is developed to efficiently acquire feature volumes from multi-view images. The 3D volumes are then trained on a diffusion model for text-to-3D generation using a 3D U-Net. This research further addresses the challenges of inaccurate object captions and high-dimensional feature volumes. The proposed model, trained on the public Objaverse dataset, demonstrates promising outcomes in producing diverse and recognizable samples from text prompts. Notably, it empowers finer control over object part characteristics through textual cues, fostering model creativity by seamlessly combining multiple concepts within a single object. This research significantly contributes to the progress of 3D generation by introducing an efficient, flexible, and scalable representation methodology. Code is available at https://github.com/tzco/VolumeDiffusion.

VolumeDiffusion：具有高效体积编码器的灵活文本到3D生成

VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder

摘要

Support