単一視点からの3D人体デジタル化と大規模再構成モデル

要旨

本論文では、単一画像から人間のNeural Radiance Fields（NeRF）を予測するために設計された、シングルステージのフィードフォワード型Large Reconstruction ModelであるHuman-LRMを紹介する。我々のアプローチは、3Dスキャンやマルチビューキャプチャを含む大規模なデータセットを用いたトレーニングにおいて、顕著な適応性を示す。さらに、特にオクルージョンが存在する実世界のシナリオにおけるモデルの適用性を向上させるため、条件付きトライプレーン拡散モデルを介してマルチビュー再構成を単一ビューに蒸留する新たな戦略を提案する。この生成的拡張により、単一ビューから観察される人体形状の固有の変動に対処し、オクルージョンがかかった画像からでも全身を再構成することが可能となる。大規模な実験を通じて、Human-LRMが複数のベンチマークにおいて従来の手法を大きく上回る性能を示すことを実証する。

English

In this paper, we introduce Human-LRM, a single-stage feed-forward Large Reconstruction Model designed to predict human Neural Radiance Fields (NeRF) from a single image. Our approach demonstrates remarkable adaptability in training using extensive datasets containing 3D scans and multi-view capture. Furthermore, to enhance the model's applicability for in-the-wild scenarios especially with occlusions, we propose a novel strategy that distills multi-view reconstruction into single-view via a conditional triplane diffusion model. This generative extension addresses the inherent variations in human body shapes when observed from a single view, and makes it possible to reconstruct the full body human from an occluded image. Through extensive experiments, we show that Human-LRM surpasses previous methods by a significant margin on several benchmarks.

単一視点からの3D人体デジタル化と大規模再構成モデル

Single-View 3D Human Digitalization with Large Reconstruction Models

要旨

Support