Instant3D：通过稀疏视图生成和大型重建模型实现快速文本到3D转换。

摘要

最近，基于扩散模型的文本生成3D技术取得了显著进展。然而，现有方法要么依赖基于分数蒸馏的优化，存在推理速度慢、多样性低和双面问题，要么是前馈方法，由于3D训练数据稀缺而生成质量低下的结果。本文提出了Instant3D，一种新颖的方法，可以以前馈方式从文本提示中生成高质量且多样化的3D资源。我们采用两阶段范式，首先利用经过微调的2D文本到图像扩散模型一次性从文本中生成一组四个结构化且一致的视图，然后通过基于变压器的稀疏视图重建器直接回归生成的图像中的NeRF。通过大量实验证明，我们的方法能够在20秒内生成高质量、多样化且无双面问题的3D资源，比之前基于优化的方法快两个数量级，后者可能需要1到10小时。我们的项目网页：https://jiahao.ai/instant3d/。

English

Text-to-3D with diffusion models have achieved remarkable progress in recent years. However, existing methods either rely on score distillation-based optimization which suffer from slow inference, low diversity and Janus problems, or are feed-forward methods that generate low quality results due to the scarcity of 3D training data. In this paper, we propose Instant3D, a novel method that generates high-quality and diverse 3D assets from text prompts in a feed-forward manner. We adopt a two-stage paradigm, which first generates a sparse set of four structured and consistent views from text in one shot with a fine-tuned 2D text-to-image diffusion model, and then directly regresses the NeRF from the generated images with a novel transformer-based sparse-view reconstructor. Through extensive experiments, we demonstrate that our method can generate high-quality, diverse and Janus-free 3D assets within 20 seconds, which is two order of magnitude faster than previous optimization-based methods that can take 1 to 10 hours. Our project webpage: https://jiahao.ai/instant3d/.

Instant3D：通过稀疏视图生成和大型重建模型实现快速文本到3D转换。

Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model

摘要

Support