即時3D：即時文字轉3D生成

摘要

文本到3D生成旨在從文本提示中合成生動的3D物體，引起了計算機視覺社區的廣泛關注。儘管一些現有的作品在這一任務上取得了令人印象深刻的成果，但主要依賴耗時的優化範式。具體來說，這些方法為每個文本提示從頭開始優化一個神經場，生成一個物體大約需要一個小時或更長時間。這種沉重且重複的訓練成本阻礙了它們的實際應用。在本文中，我們提出了一個新的框架，用於快速文本到3D生成，名為Instant3D。一旦訓練完成，Instant3D能夠在不到一秒的時間內使用前向網絡的單次運行為看不見的文本提示創建一個3D物體。我們通過設計一個新的網絡，直接從文本提示中構建一個3D三面體，實現了這種卓越的速度。我們Instant3D的核心創新在於探索有效將文本條件注入網絡的策略。此外，我們提出了一個簡單而有效的激活函數，稱為縮放sigmoid，以取代原始sigmoid函數，將訓練收斂速度提高了十倍以上。最後，為了解決3D生成中的Janus（多頭）問題，我們提出了一種自適應Perp-Neg算法，可以根據訓練過程中Janus問題的嚴重程度動態調整其概念否定比例，有效減少多頭效應。在各種基準數據集上進行的大量實驗表明，所提出的算法在質量和量化方面都優於最先進的方法，同時實現了顯著更好的效率。項目頁面位於https://ming1993li.github.io/Instant3DProj。

English

Text-to-3D generation, which aims to synthesize vivid 3D objects from text prompts, has attracted much attention from the computer vision community. While several existing works have achieved impressive results for this task, they mainly rely on a time-consuming optimization paradigm. Specifically, these methods optimize a neural field from scratch for each text prompt, taking approximately one hour or more to generate one object. This heavy and repetitive training cost impedes their practical deployment. In this paper, we propose a novel framework for fast text-to-3D generation, dubbed Instant3D. Once trained, Instant3D is able to create a 3D object for an unseen text prompt in less than one second with a single run of a feedforward network. We achieve this remarkable speed by devising a new network that directly constructs a 3D triplane from a text prompt. The core innovation of our Instant3D lies in our exploration of strategies to effectively inject text conditions into the network. Furthermore, we propose a simple yet effective activation function, the scaled-sigmoid, to replace the original sigmoid function, which speeds up the training convergence by more than ten times. Finally, to address the Janus (multi-head) problem in 3D generation, we propose an adaptive Perp-Neg algorithm that can dynamically adjust its concept negation scales according to the severity of the Janus problem during training, effectively reducing the multi-head effect. Extensive experiments on a wide variety of benchmark datasets demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods both qualitatively and quantitatively, while achieving significantly better efficiency. The project page is at https://ming1993li.github.io/Instant3DProj.

即時3D：即時文字轉3D生成

Instant3D: Instant Text-to-3D Generation

摘要

Support