Wonder3D：クロスドメイン拡散を用いた単一画像からの3D生成

要旨

本研究では、単一視点画像から高精細なテクスチャ付きメッシュを効率的に生成する新規手法「Wonder3D」を提案する。最近のScore Distillation Sampling（SDS）に基づく手法は、2D拡散事前分布から3D形状を復元する可能性を示しているが、形状ごとの時間のかかる最適化や一貫性のない形状生成に悩まされることが多い。一方、高速なネットワーク推論によって直接3D情報を生成する手法もあるが、その結果は品質が低く、幾何学的な詳細が欠けていることが多い。画像から3Dへの変換タスクにおいて、品質、一貫性、効率を包括的に改善するため、我々はマルチビューの法線マップと対応するカラー画像を生成するクロスドメイン拡散モデルを提案する。一貫性を確保するために、ビュー間およびモダリティ間の情報交換を促進するマルチビュークロスドメインアテンション機構を採用する。最後に、マルチビュー2D表現から高品質な表面を抽出する幾何学的に意識した法線融合アルゴリズムを導入する。広範な評価を通じて、本手法が従来手法と比較して高品質な再構成結果、堅牢な汎化性能、そして合理的な効率性を達成することを実証する。

English

In this work, we introduce Wonder3D, a novel method for efficiently generating high-fidelity textured meshes from single-view images.Recent methods based on Score Distillation Sampling (SDS) have shown the potential to recover 3D geometry from 2D diffusion priors, but they typically suffer from time-consuming per-shape optimization and inconsistent geometry. In contrast, certain works directly produce 3D information via fast network inferences, but their results are often of low quality and lack geometric details. To holistically improve the quality, consistency, and efficiency of image-to-3D tasks, we propose a cross-domain diffusion model that generates multi-view normal maps and the corresponding color images. To ensure consistency, we employ a multi-view cross-domain attention mechanism that facilitates information exchange across views and modalities. Lastly, we introduce a geometry-aware normal fusion algorithm that extracts high-quality surfaces from the multi-view 2D representations. Our extensive evaluations demonstrate that our method achieves high-quality reconstruction results, robust generalization, and reasonably good efficiency compared to prior works.

Wonder3D：クロスドメイン拡散を用いた単一画像からの3D生成

Wonder3D: Single Image to 3D using Cross-Domain Diffusion

要旨

Support