Instant3D：使用稀疏視圖生成和大型重建模型快速將文本轉換為3D

摘要

最近幾年，擁有擴散模型的文本轉3D技術取得了顯著進展。然而，現有方法要麼依賴基於分數提煉的優化，這些方法存在推理速度慢、多樣性低和 Janus 問題，要麼是前向傳播方法，由於3D訓練數據稀缺，導致生成的結果質量低。在本文中，我們提出了Instant3D，一種新穎的方法，以前向傳播的方式從文本提示中生成高質量和多樣化的3D資產。我們採用了兩階段範式，首先使用微調的2D文本到圖像擴散模型一次性生成來自文本的四個結構化和一致的稀疏視圖集，然後通過一種新穎的基於Transformer的稀疏視圖重建器直接回歸生成的圖像中的NeRF。通過大量實驗，我們證明我們的方法可以在20秒內生成高質量、多樣化且無Janus問題的3D資產，比之前基於優化的方法快兩個數量級，後者可能需要1到10小時。我們的項目網頁：https://jiahao.ai/instant3d/。

English

Text-to-3D with diffusion models have achieved remarkable progress in recent years. However, existing methods either rely on score distillation-based optimization which suffer from slow inference, low diversity and Janus problems, or are feed-forward methods that generate low quality results due to the scarcity of 3D training data. In this paper, we propose Instant3D, a novel method that generates high-quality and diverse 3D assets from text prompts in a feed-forward manner. We adopt a two-stage paradigm, which first generates a sparse set of four structured and consistent views from text in one shot with a fine-tuned 2D text-to-image diffusion model, and then directly regresses the NeRF from the generated images with a novel transformer-based sparse-view reconstructor. Through extensive experiments, we demonstrate that our method can generate high-quality, diverse and Janus-free 3D assets within 20 seconds, which is two order of magnitude faster than previous optimization-based methods that can take 1 to 10 hours. Our project webpage: https://jiahao.ai/instant3d/.

Instant3D：使用稀疏視圖生成和大型重建模型快速將文本轉換為3D

Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model

摘要

Support