ドライバブル3Dガウシアンアバター

要旨

本論文では、ガウススプラットを用いてレンダリングされた人体向けの最初の3D制御可能モデルであるDrivable 3D Gaussian Avatars (D3GA)を提案します。現在のフォトリアリスティックなドライバブルアバターは、学習時に正確な3D登録を必要とするか、テスト時に高密度な入力画像を必要とするか、あるいはその両方を必要とします。また、ニューラルラジアンスフィールドに基づくものは、テレプレゼンスアプリケーションにとって実用的でないほど遅い傾向があります。本研究では、最近発表された3D Gaussian Splatting (3DGS)技術を活用し、高密度で較正されたマルチビュービデオを入力として、リアルタイムフレームレートでリアルな人体をレンダリングします。これらのプリミティブを変形させるために、一般的に使用されるポイント変形手法である線形ブレンドスキニング(LBS)から離れ、古典的な体積変形手法であるケージ変形を使用します。その小さなサイズを考慮し、これらの変形を通信アプリケーションに適した関節角度とキーポイントで駆動します。多様な体型、衣服、動きを持つ9人の被験者に対する実験では、同じ学習データとテストデータを使用した場合、最先端の手法よりも高品質な結果を得ました。

English

We present Drivable 3D Gaussian Avatars (D3GA), the first 3D controllable model for human bodies rendered with Gaussian splats. Current photorealistic drivable avatars require either accurate 3D registrations during training, dense input images during testing, or both. The ones based on neural radiance fields also tend to be prohibitively slow for telepresence applications. This work uses the recently presented 3D Gaussian Splatting (3DGS) technique to render realistic humans at real-time framerates, using dense calibrated multi-view videos as input. To deform those primitives, we depart from the commonly used point deformation method of linear blend skinning (LBS) and use a classic volumetric deformation method: cage deformations. Given their smaller size, we drive these deformations with joint angles and keypoints, which are more suitable for communication applications. Our experiments on nine subjects with varied body shapes, clothes, and motions obtain higher-quality results than state-of-the-art methods when using the same training and test data.