使用高斯点云变换器从单张图像中精确重建3D人体
GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers
September 6, 2024
作者: Lorenza Prospero, Abdullah Hamdi, Joao F. Henriques, Christian Rupprecht
cs.AI
摘要
从单眼图像重建逼真的3D人体模型在创意产业、人机界面和医疗保健领域具有重要应用。我们的工作基于3D高斯飘零(3DGS),这是一种由高斯混合组成的场景表示。从单个输入图像预测人体的这种混合物具有挑战性,因为它是非均匀密度(与输入像素存在多对一关系),并受到严格的物理约束。同时,它需要具有灵活性,以适应各种服装和姿势。我们的关键观察是,标准化人体网格(如SMPL)的顶点可以提供足够的密度和高斯的近似初始位置。然后,我们可以训练一个转换模型,共同预测相对较小的这些位置调整,以及其他高斯属性和SMPL参数。我们通过实验证明,这种组合(仅使用多视图监督)可以从单个图像快速推断出3D人体模型,而无需测试时优化、昂贵的扩散模型或3D点监督。我们还表明,它可以通过更好地适应考虑服装和其他变化的人体模型来改善3D姿势估计。该代码可在项目网站 https://abdullahamdi.com/gst/ 上找到。
English
Reconstructing realistic 3D human models from monocular images has
significant applications in creative industries, human-computer interfaces, and
healthcare. We base our work on 3D Gaussian Splatting (3DGS), a scene
representation composed of a mixture of Gaussians. Predicting such mixtures for
a human from a single input image is challenging, as it is a non-uniform
density (with a many-to-one relationship with input pixels) with strict
physical constraints. At the same time, it needs to be flexible to accommodate
a variety of clothes and poses. Our key observation is that the vertices of
standardized human meshes (such as SMPL) can provide an adequate density and
approximate initial position for Gaussians. We can then train a transformer
model to jointly predict comparatively small adjustments to these positions, as
well as the other Gaussians' attributes and the SMPL parameters. We show
empirically that this combination (using only multi-view supervision) can
achieve fast inference of 3D human models from a single image without test-time
optimization, expensive diffusion models, or 3D points supervision. We also
show that it can improve 3D pose estimation by better fitting human models that
account for clothes and other variations. The code is available on the project
website https://abdullahamdi.com/gst/ .Summary
AI-Generated Summary