Hyper-VolTran: ハイパーネットワークによる高速かつ汎用的なワンショット画像から3Dオブジェクト構造への変換

要旨

単一視点からの画像から3Dへの変換は不良設定問題であり、現在の拡散モデルを用いたニューラル再構築手法は、依然としてシーン固有の最適化に依存しており、その汎化能力が制限されています。既存手法の汎化性と一貫性に関する制約を克服するため、我々は新しいニューラルレンダリング技術を提案します。本手法は、符号付き距離関数を表面表現として採用し、ジオメトリエンコーディングボリュームとハイパーネットワークを通じて汎化可能な事前情報を組み込みます。具体的には、生成された多視点入力を基にニューラルエンコーディングボリュームを構築します。テスト時に、入力画像に基づいてSDFネットワークの重みを調整し、ハイパーネットワークを介してフィードフォワード方式で新しいシーンに適応できるようにします。合成ビューから生じるアーティファクトを軽減するため、各視点を個別に処理するのではなく、ボリュームトランスフォーマーモジュールを使用して画像特徴の集約を改善することを提案します。提案手法であるHyper-VolTranにより、シーン固有の最適化のボトルネックを回避し、複数の視点から生成された画像間の一貫性を維持します。実験結果は、提案手法の優位性を示しており、一貫した結果と迅速な生成が可能であることを実証しています。

English

Solving image-to-3D from a single view is an ill-posed problem, and current neural reconstruction methods addressing it through diffusion models still rely on scene-specific optimization, constraining their generalization capability. To overcome the limitations of existing approaches regarding generalization and consistency, we introduce a novel neural rendering technique. Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks. Specifically, our method builds neural encoding volumes from generated multi-view inputs. We adjust the weights of the SDF network conditioned on an input image at test-time to allow model adaptation to novel scenes in a feed-forward manner via HyperNetworks. To mitigate artifacts derived from the synthesized views, we propose the use of a volume transformer module to improve the aggregation of image features instead of processing each viewpoint separately. Through our proposed method, dubbed as Hyper-VolTran, we avoid the bottleneck of scene-specific optimization and maintain consistency across the images generated from multiple viewpoints. Our experiments show the advantages of our proposed approach with consistent results and rapid generation.

Hyper-VolTran: ハイパーネットワークによる高速かつ汎用的なワンショット画像から3Dオブジェクト構造への変換

Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks

要旨

Support