Instant3D：即时文本到三维生成

摘要

文本到3D生成旨在从文本提示中合成生动的3D对象，已经引起了计算机视觉社区的广泛关注。虽然一些现有作品在这一任务上取得了令人印象深刻的成果，但它们主要依赖耗时的优化范式。具体来说，这些方法为每个文本提示从头开始优化一个神经场，生成一个对象大约需要一个小时或更长时间。这种沉重和重复的训练成本阻碍了它们的实际部署。在本文中，我们提出了一个新颖的快速文本到3D生成框架，名为Instant3D。一旦训练完成，Instant3D能够在不到一秒的时间内使用前向网络的单次运行为一个未见过的文本提示创建一个3D对象。我们通过设计一个新的网络，直接从文本提示构建一个3D三面体，实现了这一显著的速度。Instant3D的核心创新在于我们探索有效地将文本条件注入网络的策略。此外，我们提出了一个简单而有效的激活函数，即缩放Sigmoid，用以取代原始Sigmoid函数，训练收敛速度提高了十倍以上。最后，为了解决3D生成中的Janus（多头）问题，我们提出了一种自适应Perp-Neg算法，可以根据训练过程中Janus问题的严重程度动态调整其概念否定比例，有效减少多头效应。在广泛的基准数据集上进行的大量实验表明，所提出的算法在质量和数量上均表现优异，同时实现了显著更高的效率。项目页面位于https://ming1993li.github.io/Instant3DProj。

English

Text-to-3D generation, which aims to synthesize vivid 3D objects from text prompts, has attracted much attention from the computer vision community. While several existing works have achieved impressive results for this task, they mainly rely on a time-consuming optimization paradigm. Specifically, these methods optimize a neural field from scratch for each text prompt, taking approximately one hour or more to generate one object. This heavy and repetitive training cost impedes their practical deployment. In this paper, we propose a novel framework for fast text-to-3D generation, dubbed Instant3D. Once trained, Instant3D is able to create a 3D object for an unseen text prompt in less than one second with a single run of a feedforward network. We achieve this remarkable speed by devising a new network that directly constructs a 3D triplane from a text prompt. The core innovation of our Instant3D lies in our exploration of strategies to effectively inject text conditions into the network. Furthermore, we propose a simple yet effective activation function, the scaled-sigmoid, to replace the original sigmoid function, which speeds up the training convergence by more than ten times. Finally, to address the Janus (multi-head) problem in 3D generation, we propose an adaptive Perp-Neg algorithm that can dynamically adjust its concept negation scales according to the severity of the Janus problem during training, effectively reducing the multi-head effect. Extensive experiments on a wide variety of benchmark datasets demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods both qualitatively and quantitatively, while achieving significantly better efficiency. The project page is at https://ming1993li.github.io/Instant3DProj.

Instant3D：即时文本到三维生成

Instant3D: Instant Text-to-3D Generation

摘要

Support