Instant3D: 即時テキストから3D生成

要旨

テキストから3D生成を行う技術は、テキストプロンプトから鮮やかな3Dオブジェクトを合成することを目的として、コンピュータビジョンコミュニティから大きな注目を集めています。これまでにいくつかの既存研究がこのタスクで印象的な成果を上げてきましたが、それらは主に時間のかかる最適化パラダイムに依存しています。具体的には、これらの手法は各テキストプロンプトに対してニューラルフィールドをゼロから最適化し、1つのオブジェクトを生成するのに約1時間以上を要します。この重く繰り返しの多いトレーニングコストが、実用化の障壁となっています。本論文では、高速なテキストから3D生成のための新しいフレームワーク「Instant3D」を提案します。一度トレーニングされると、Instant3Dは未見のテキストプロンプトに対して、フィードフォワードネットワークの単一実行で1秒未満で3Dオブジェクトを生成することができます。この驚異的な速度を実現するために、テキストプロンプトから直接3Dトライプレーンを構築する新しいネットワークを考案しました。Instant3Dの中核的な革新は、テキスト条件をネットワークに効果的に注入するための戦略の探求にあります。さらに、トレーニングの収束を10倍以上高速化するために、元のシグモイド関数を置き換えるシンプルでありながら効果的な活性化関数「スケーリングドシグモイド」を提案します。最後に、3D生成におけるヤヌス（多頭）問題に対処するために、トレーニング中のヤヌス問題の深刻度に応じて概念否定スケールを動的に調整できる適応型Perp-Negアルゴリズムを提案し、多頭効果を効果的に低減します。多様なベンチマークデータセットでの広範な実験により、提案アルゴリズムが質的・量的に最先端の手法に対して優れた性能を示し、さらに大幅に優れた効率を達成することが実証されました。プロジェクトページはhttps://ming1993li.github.io/Instant3DProjにあります。

English

Text-to-3D generation, which aims to synthesize vivid 3D objects from text prompts, has attracted much attention from the computer vision community. While several existing works have achieved impressive results for this task, they mainly rely on a time-consuming optimization paradigm. Specifically, these methods optimize a neural field from scratch for each text prompt, taking approximately one hour or more to generate one object. This heavy and repetitive training cost impedes their practical deployment. In this paper, we propose a novel framework for fast text-to-3D generation, dubbed Instant3D. Once trained, Instant3D is able to create a 3D object for an unseen text prompt in less than one second with a single run of a feedforward network. We achieve this remarkable speed by devising a new network that directly constructs a 3D triplane from a text prompt. The core innovation of our Instant3D lies in our exploration of strategies to effectively inject text conditions into the network. Furthermore, we propose a simple yet effective activation function, the scaled-sigmoid, to replace the original sigmoid function, which speeds up the training convergence by more than ten times. Finally, to address the Janus (multi-head) problem in 3D generation, we propose an adaptive Perp-Neg algorithm that can dynamically adjust its concept negation scales according to the severity of the Janus problem during training, effectively reducing the multi-head effect. Extensive experiments on a wide variety of benchmark datasets demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods both qualitatively and quantitatively, while achieving significantly better efficiency. The project page is at https://ming1993li.github.io/Instant3DProj.

Instant3D: 即時テキストから3D生成

Instant3D: Instant Text-to-3D Generation

要旨

Support