MagicMan: 3D認識を備えた拡散と反復精緻化による人間の生成的小説ビュー合成

要旨

単一画像の人物再構築に関する既存の研究は、訓練データの不足や3Dの不整合により一般化能力が弱いという課題があります。本論文では、単一の参照画像から高品質な新しい視点画像を生成するために設計された人物専用のマルチビュー拡散モデルであるMagicManを紹介します。このモデルの中核として、一般化能力を向上させるために事前に訓練された2D拡散モデルを生成事前知識とし、3Dボディ事前知識としてパラメトリックなSMPL-Xモデルを活用して3D認識を促進します。改善された3D人物再構築のために密なマルチビュー生成を実現しつつ一貫性を維持するという重要な課題に取り組むために、まず異なる視点間で効率的かつ徹底した情報のやり取りを促進するためにハイブリッドマルチビューアテンションを導入します。さらに、RGBおよび法線領域で同時に生成を行うジオメトリに敏感なデュアルブランチを提案し、ジオメトリの手掛かりを介して一貫性をさらに向上させます。また、参照画像と一致しない不正確なSMPL-X推定から生じる不適切な形状の問題に対処するために、進化的なリファイン戦略を提案し、SMPL-Xの精度を逐次最適化しながら生成されたマルチビューの品質と一貫性を向上させます。幅広い実験結果は、当社の手法が新しい視点合成および続く3D人物再構築タスクの両方で既存の手法を大幅に上回ることを示しています。

English

Existing works in single-image human reconstruction suffer from weak generalizability due to insufficient training data or 3D inconsistencies for a lack of comprehensive multi-view knowledge. In this paper, we introduce MagicMan, a human-specific multi-view diffusion model designed to generate high-quality novel view images from a single reference image. As its core, we leverage a pre-trained 2D diffusion model as the generative prior for generalizability, with the parametric SMPL-X model as the 3D body prior to promote 3D awareness. To tackle the critical challenge of maintaining consistency while achieving dense multi-view generation for improved 3D human reconstruction, we first introduce hybrid multi-view attention to facilitate both efficient and thorough information interchange across different views. Additionally, we present a geometry-aware dual branch to perform concurrent generation in both RGB and normal domains, further enhancing consistency via geometry cues. Last but not least, to address ill-shaped issues arising from inaccurate SMPL-X estimation that conflicts with the reference image, we propose a novel iterative refinement strategy, which progressively optimizes SMPL-X accuracy while enhancing the quality and consistency of the generated multi-views. Extensive experimental results demonstrate that our method significantly outperforms existing approaches in both novel view synthesis and subsequent 3D human reconstruction tasks.

MagicMan: 3D認識を備えた拡散と反復精緻化による人間の生成的小説ビュー合成

MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement

要旨

Support