スパースビュー動画からの再照明可能でアニメーション可能なニューラルアバター

要旨

本論文は、未知の照明条件下で動的な人物を撮影したスパースビュー（または単眼）ビデオから、再照明可能かつアニメーション可能なニューラルアバターを作成するという課題に取り組む。スタジオ環境と比較して、この設定はより実用的でアクセスしやすいが、非常に困難な不良設定問題を引き起こす。従来のニューラルヒューマン再構築手法は、変形した符号付き距離場（SDF）を使用してスパースビューからアニメーション可能なアバターを再構築できるが、再照明のための材質パラメータを回復することはできない。一方、微分可能な逆レンダリングベースの手法は静的な物体の材質回復に成功しているが、動的な人物に拡張するのは容易ではなく、逆レンダリングのために変形したSDF上のピクセル-表面交差と光の可視性を計算するのは計算集約的である。この課題を解決するため、我々は任意の人物ポーズ下でのワールド空間距離を近似する階層的距離クエリ（HDQ）アルゴリズムを提案する。具体的には、パラメトリックな人体モデルに基づいて粗い距離を推定し、SDFの局所変形不変性を利用して細かい距離を計算する。HDQアルゴリズムに基づき、球面トレーシングを活用して表面交差と光の可視性を効率的に推定する。これにより、スパースビュー（または単眼）入力からアニメーション可能かつ再照明可能なニューラルアバターを回復する初のシステムを開発した。実験により、我々のアプローチが最先端の手法と比較して優れた結果を生成できることが示された。再現性のために我々のコードを公開する予定である。

English

This paper tackles the challenge of creating relightable and animatable neural avatars from sparse-view (or even monocular) videos of dynamic humans under unknown illumination. Compared to studio environments, this setting is more practical and accessible but poses an extremely challenging ill-posed problem. Previous neural human reconstruction methods are able to reconstruct animatable avatars from sparse views using deformed Signed Distance Fields (SDF) but cannot recover material parameters for relighting. While differentiable inverse rendering-based methods have succeeded in material recovery of static objects, it is not straightforward to extend them to dynamic humans as it is computationally intensive to compute pixel-surface intersection and light visibility on deformed SDFs for inverse rendering. To solve this challenge, we propose a Hierarchical Distance Query (HDQ) algorithm to approximate the world space distances under arbitrary human poses. Specifically, we estimate coarse distances based on a parametric human model and compute fine distances by exploiting the local deformation invariance of SDF. Based on the HDQ algorithm, we leverage sphere tracing to efficiently estimate the surface intersection and light visibility. This allows us to develop the first system to recover animatable and relightable neural avatars from sparse view (or monocular) inputs. Experiments demonstrate that our approach is able to produce superior results compared to state-of-the-art methods. Our code will be released for reproducibility.

スパースビュー動画からの再照明可能でアニメーション可能なニューラルアバター

Relightable and Animatable Neural Avatar from Sparse-View Video

要旨

Support