Snap-Snap: 2枚の画像からミリ秒単位で3D人間ガウシアンを再構築する

要旨

疎な視点からの3D人体再構築は魅力的な研究テーマであり、関連アプリケーションの幅を広げる上で重要です。本論文では、正面と背面の2枚の画像のみから人体を再構築するという非常に挑戦的だが価値ある課題を提案します。これにより、ユーザーが自身の3Dデジタルヒューマンを作成する際の障壁を大幅に低減できます。主な課題は、3D整合性の構築と、極めて疎な入力からの情報復元の難しさにあります。我々は、基礎再構築モデルを基にした幾何学再構築モデルを再設計し、入力画像が広範な人体データトレーニングと重複が少ない場合でも一貫した点群を予測します。さらに、欠落した色情報を補完するための拡張アルゴリズムを適用し、完全な色付き人体点群を取得します。これらは、より優れたレンダリング品質を得るために3Dガウシアンに直接変換されます。実験結果では、NVIDIA RTX 4090単体で1024x1024解像度の2枚の画像を用いて、190ミリ秒で人体全体を再構築できることを示し、THuman2.0およびクロスドメインデータセットにおいて最先端の性能を実証しました。さらに、低コストのモバイルデバイスで撮影された画像でも人体再構築を完了できるため、データ収集の要件を低減します。デモとコードはhttps://hustvl.github.io/Snap-Snap/で公開されています。

English

Reconstructing 3D human bodies from sparse views has been an appealing topic, which is crucial to broader the related applications. In this paper, we propose a quite challenging but valuable task to reconstruct the human body from only two images, i.e., the front and back view, which can largely lower the barrier for users to create their own 3D digital humans. The main challenges lie in the difficulty of building 3D consistency and recovering missing information from the highly sparse input. We redesign a geometry reconstruction model based on foundation reconstruction models to predict consistent point clouds even input images have scarce overlaps with extensive human data training. Furthermore, an enhancement algorithm is applied to supplement the missing color information, and then the complete human point clouds with colors can be obtained, which are directly transformed into 3D Gaussians for better rendering quality. Experiments show that our method can reconstruct the entire human in 190 ms on a single NVIDIA RTX 4090, with two images at a resolution of 1024x1024, demonstrating state-of-the-art performance on the THuman2.0 and cross-domain datasets. Additionally, our method can complete human reconstruction even with images captured by low-cost mobile devices, reducing the requirements for data collection. Demos and code are available at https://hustvl.github.io/Snap-Snap/.

Snap-Snap: 2枚の画像からミリ秒単位で3D人間ガウシアンを再構築する

Snap-Snap: Taking Two Images to Reconstruct 3D Human Gaussians in Milliseconds

要旨

Support