單張影像中的精確3D人體模型:利用高斯點陣變換器
GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers
September 6, 2024
作者: Lorenza Prospero, Abdullah Hamdi, Joao F. Henriques, Christian Rupprecht
cs.AI
摘要
從單眼影像重建逼真的3D人體模型在創意產業、人機介面和醫療保健領域具有重要應用。我們的工作基於3D高斯飄落(3DGS),這是一種由高斯混合組成的場景表示。從單張輸入圖像中預測人體的這種混合物是具有挑戰性的,因為它是一種非均勻密度(與輸入像素存在多對一的關係),並且受到嚴格的物理約束。同時,它需要具有靈活性,以容納各種服裝和姿勢。我們的關鍵觀察是,標準化人類網格(如SMPL)的頂點可以提供足夠的密度和高斯的近似初始位置。然後,我們可以訓練一個轉換器模型,共同預測相對較小的這些位置調整,以及其他高斯的屬性和SMPL參數。我們實證表明,這種組合(僅使用多視圖監督)可以實現從單張圖像快速推斷3D人體模型,而無需測試時優化、昂貴的擴散模型或3D點監督。我們還展示它可以通過更好地擬合考慮服裝和其他變化的人體模型來改善3D姿勢估計。代碼可在項目網站https://abdullahamdi.com/gst/ 上找到。
English
Reconstructing realistic 3D human models from monocular images has
significant applications in creative industries, human-computer interfaces, and
healthcare. We base our work on 3D Gaussian Splatting (3DGS), a scene
representation composed of a mixture of Gaussians. Predicting such mixtures for
a human from a single input image is challenging, as it is a non-uniform
density (with a many-to-one relationship with input pixels) with strict
physical constraints. At the same time, it needs to be flexible to accommodate
a variety of clothes and poses. Our key observation is that the vertices of
standardized human meshes (such as SMPL) can provide an adequate density and
approximate initial position for Gaussians. We can then train a transformer
model to jointly predict comparatively small adjustments to these positions, as
well as the other Gaussians' attributes and the SMPL parameters. We show
empirically that this combination (using only multi-view supervision) can
achieve fast inference of 3D human models from a single image without test-time
optimization, expensive diffusion models, or 3D points supervision. We also
show that it can improve 3D pose estimation by better fitting human models that
account for clothes and other variations. The code is available on the project
website https://abdullahamdi.com/gst/ .Summary
AI-Generated Summary