FlexiDreamer: FlexiCubesを用いた単一画像からの3D生成

要旨

テキストプロンプトや単一画像からの3Dコンテンツ生成は、最近、品質と速度において著しい進歩を遂げています。その主要なパラダイムの一つは、一貫性のある複数視点画像を生成し、その後、スパースビュー再構成を行うというものです。しかし、メッシュ表現を直接変形させて目標トポロジーに近づけることの難しさから、ほとんどの手法では、スパースビュー再構成中に暗黙的表現（NeRFなど）を学習し、後処理抽出によって目標メッシュを取得します。暗黙的表現は豊富な3D情報を効果的にモデル化できますが、その訓練には通常、長い収束時間を要します。さらに、暗黙的フィールドからの後抽出操作も、望ましくない視覚的アーティファクトを引き起こします。本論文では、FlexiDreamerという新しい単一画像から3Dを生成するフレームワークを提案します。このフレームワークは、FlexiCubesとして知られる柔軟な勾配ベースの抽出を活用することで、後処理による欠点を回避し、目標メッシュを直接取得することを可能にします。さらに、多解像度ハッシュグリッドエンコーディングスキームを組み込み、FlexiCubesの暗黙的フィールドにエンコーディングレベルを段階的に活性化させることで、ステップごとの最適化における幾何学的詳細の捕捉を支援します。特に、FlexiDreamerは、単一のNVIDIA A100 GPU上で、単一視点画像から密な3D構造を約1分で復元し、従来の手法を大幅に上回る性能を示します。

English

3D content generation from text prompts or single images has made remarkable progress in quality and speed recently. One of its dominant paradigms involves generating consistent multi-view images followed by a sparse-view reconstruction. However, due to the challenge of directly deforming the mesh representation to approach the target topology, most methodologies learn an implicit representation (such as NeRF) during the sparse-view reconstruction and acquire the target mesh by a post-processing extraction. Although the implicit representation can effectively model rich 3D information, its training typically entails a long convergence time. In addition, the post-extraction operation from the implicit field also leads to undesirable visual artifacts. In this paper, we propose FlexiDreamer, a novel single image-to-3d generation framework that reconstructs the target mesh in an end-to-end manner. By leveraging a flexible gradient-based extraction known as FlexiCubes, our method circumvents the defects brought by the post-processing and facilitates a direct acquisition of the target mesh. Furthermore, we incorporate a multi-resolution hash grid encoding scheme that progressively activates the encoding levels into the implicit field in FlexiCubes to help capture geometric details for per-step optimization. Notably, FlexiDreamer recovers a dense 3D structure from a single-view image in approximately 1 minute on a single NVIDIA A100 GPU, outperforming previous methodologies by a large margin.

FlexiDreamer: FlexiCubesを用いた単一画像からの3D生成

FlexiDreamer: Single Image-to-3D Generation with FlexiCubes

要旨

Support