VolumeDiffusion: 効率的な体積エンコーダを用いた柔軟なテキストから3D生成

要旨

本論文は、テキストから3D生成を行うための革新的な3D体積エンコーダを紹介する。拡散モデルのトレーニングデータを拡大するために、マルチビュー画像から効率的に特徴体積を取得する軽量ネットワークを開発した。その後、3D U-Netを用いてテキストから3D生成のための拡散モデルで3D体積をトレーニングした。本研究はさらに、不正確なオブジェクトキャプションと高次元特徴体積の課題に対処する。提案モデルは、公開されているObjaverseデータセットでトレーニングされ、テキストプロンプトから多様で認識可能なサンプルを生成する際に有望な結果を示した。特に、テキストキューを通じてオブジェクト部分の特性をより細かく制御し、単一のオブジェクト内で複数の概念をシームレスに組み合わせることでモデルの創造性を促進する。本研究は、効率的で柔軟かつスケーラブルな表現方法を導入することで、3D生成の進展に大きく貢献する。コードはhttps://github.com/tzco/VolumeDiffusionで公開されている。

English

This paper introduces a pioneering 3D volumetric encoder designed for text-to-3D generation. To scale up the training data for the diffusion model, a lightweight network is developed to efficiently acquire feature volumes from multi-view images. The 3D volumes are then trained on a diffusion model for text-to-3D generation using a 3D U-Net. This research further addresses the challenges of inaccurate object captions and high-dimensional feature volumes. The proposed model, trained on the public Objaverse dataset, demonstrates promising outcomes in producing diverse and recognizable samples from text prompts. Notably, it empowers finer control over object part characteristics through textual cues, fostering model creativity by seamlessly combining multiple concepts within a single object. This research significantly contributes to the progress of 3D generation by introducing an efficient, flexible, and scalable representation methodology. Code is available at https://github.com/tzco/VolumeDiffusion.

VolumeDiffusion: 効率的な体積エンコーダを用いた柔軟なテキストから3D生成

VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder

要旨

Support