ARTIC3D:从嘈杂的网络图像集合中学习稳健的关节式3D形状
ARTIC3D: Learning Robust Articulated 3D Shapes from Noisy Web Image Collections
June 7, 2023
作者: Chun-Han Yao, Amit Raj, Wei-Chih Hung, Yuanzhen Li, Michael Rubinstein, Ming-Hsuan Yang, Varun Jampani
cs.AI
摘要
从单眼图像中估计动物身体等3D关节形状是一项困难的任务,因为存在摄像机视角、姿势、纹理、光照等的歧义。我们提出了ARTIC3D,这是一个自监督框架,可以从野外稀疏图像集合中重建每个实例的3D形状。具体而言,ARTIC3D基于基于骨架的表面表示,并进一步受到来自稳定扩散的2D扩散先验的指导。首先,我们通过2D扩散对输入图像进行遮挡/截断处理,以获得更清晰的蒙版估计和语义特征。其次,我们执行扩散引导的3D优化,以估计形状和纹理,这些形状和纹理具有高保真度,并且与输入图像保持一致。我们还提出了一种新颖的技术,通过扩散模型计算更稳定的图像级梯度,相较于现有的替代方案。最后,我们通过在刚性部件变换下微调渲染的形状和纹理来生成逼真的动画。对多个现有数据集以及新引入的带有遮挡和截断的嘈杂网络图像集合进行的广泛评估表明,与嘈杂图像相比,ARTIC3D的输出更具鲁棒性,在形状和纹理细节方面具有更高的质量,并且在动画化时更加逼真。项目页面:https://chhankyao.github.io/artic3d/
English
Estimating 3D articulated shapes like animal bodies from monocular images is
inherently challenging due to the ambiguities of camera viewpoint, pose,
texture, lighting, etc. We propose ARTIC3D, a self-supervised framework to
reconstruct per-instance 3D shapes from a sparse image collection in-the-wild.
Specifically, ARTIC3D is built upon a skeleton-based surface representation and
is further guided by 2D diffusion priors from Stable Diffusion. First, we
enhance the input images with occlusions/truncation via 2D diffusion to obtain
cleaner mask estimates and semantic features. Second, we perform
diffusion-guided 3D optimization to estimate shape and texture that are of
high-fidelity and faithful to input images. We also propose a novel technique
to calculate more stable image-level gradients via diffusion models compared to
existing alternatives. Finally, we produce realistic animations by fine-tuning
the rendered shape and texture under rigid part transformations. Extensive
evaluations on multiple existing datasets as well as newly introduced noisy web
image collections with occlusions and truncation demonstrate that ARTIC3D
outputs are more robust to noisy images, higher quality in terms of shape and
texture details, and more realistic when animated. Project page:
https://chhankyao.github.io/artic3d/