HyperFields：テキストからのゼロショットNeRF生成に向けて

要旨

本論文では、テキスト条件付きニューラルラジアンスフィールド（NeRF）を単一のフォワードパスと（オプションで）微調整によって生成する手法であるHyperFieldsを提案する。本手法の鍵となる要素は以下の通りである：(i) テキストトークンの埋め込みからNeRFの空間への滑らかなマッピングを学習する動的ハイパーネットワーク、(ii) 個々のNeRFにエンコードされたシーンを一つの動的ハイパーネットワークに蒸留するNeRF蒸留トレーニング。これらの技術により、単一のネットワークが100以上のユニークなシーンに適合することが可能となる。さらに、HyperFieldsがテキストとNeRFの間のより一般的なマッピングを学習し、その結果、分布内および分布外の新しいシーンをゼロショットまたは数回の微調整ステップで予測できることを示す。HyperFieldsの微調整は、学習された一般的なマッピングのおかげで収束が加速され、既存のニューラル最適化ベースの手法よりも5～10倍速く新しいシーンを合成することが可能である。アブレーション実験により、動的アーキテクチャとNeRF蒸留の両方がHyperFieldsの表現力にとって重要であることが示された。

English

We introduce HyperFields, a method for generating text-conditioned Neural Radiance Fields (NeRFs) with a single forward pass and (optionally) some fine-tuning. Key to our approach are: (i) a dynamic hypernetwork, which learns a smooth mapping from text token embeddings to the space of NeRFs; (ii) NeRF distillation training, which distills scenes encoded in individual NeRFs into one dynamic hypernetwork. These techniques enable a single network to fit over a hundred unique scenes. We further demonstrate that HyperFields learns a more general map between text and NeRFs, and consequently is capable of predicting novel in-distribution and out-of-distribution scenes -- either zero-shot or with a few finetuning steps. Finetuning HyperFields benefits from accelerated convergence thanks to the learned general map, and is capable of synthesizing novel scenes 5 to 10 times faster than existing neural optimization-based methods. Our ablation experiments show that both the dynamic architecture and NeRF distillation are critical to the expressivity of HyperFields.

HyperFields：テキストからのゼロショットNeRF生成に向けて

HyperFields: Towards Zero-Shot Generation of NeRFs from Text

要旨

Support