ARTIC3D：从嘈杂的网络图像集合中学习稳健的关节式3D形状

摘要

从单眼图像中估计动物身体等3D关节形状是一项困难的任务，因为存在摄像机视角、姿势、纹理、光照等的歧义。我们提出了ARTIC3D，这是一个自监督框架，可以从野外稀疏图像集合中重建每个实例的3D形状。具体而言，ARTIC3D基于基于骨架的表面表示，并进一步受到来自稳定扩散的2D扩散先验的指导。首先，我们通过2D扩散对输入图像进行遮挡/截断处理，以获得更清晰的蒙版估计和语义特征。其次，我们执行扩散引导的3D优化，以估计形状和纹理，这些形状和纹理具有高保真度，并且与输入图像保持一致。我们还提出了一种新颖的技术，通过扩散模型计算更稳定的图像级梯度，相较于现有的替代方案。最后，我们通过在刚性部件变换下微调渲染的形状和纹理来生成逼真的动画。对多个现有数据集以及新引入的带有遮挡和截断的嘈杂网络图像集合进行的广泛评估表明，与嘈杂图像相比，ARTIC3D的输出更具鲁棒性，在形状和纹理细节方面具有更高的质量，并且在动画化时更加逼真。项目页面：https://chhankyao.github.io/artic3d/

English

Estimating 3D articulated shapes like animal bodies from monocular images is inherently challenging due to the ambiguities of camera viewpoint, pose, texture, lighting, etc. We propose ARTIC3D, a self-supervised framework to reconstruct per-instance 3D shapes from a sparse image collection in-the-wild. Specifically, ARTIC3D is built upon a skeleton-based surface representation and is further guided by 2D diffusion priors from Stable Diffusion. First, we enhance the input images with occlusions/truncation via 2D diffusion to obtain cleaner mask estimates and semantic features. Second, we perform diffusion-guided 3D optimization to estimate shape and texture that are of high-fidelity and faithful to input images. We also propose a novel technique to calculate more stable image-level gradients via diffusion models compared to existing alternatives. Finally, we produce realistic animations by fine-tuning the rendered shape and texture under rigid part transformations. Extensive evaluations on multiple existing datasets as well as newly introduced noisy web image collections with occlusions and truncation demonstrate that ARTIC3D outputs are more robust to noisy images, higher quality in terms of shape and texture details, and more realistic when animated. Project page: https://chhankyao.github.io/artic3d/

ARTIC3D：从嘈杂的网络图像集合中学习稳健的关节式3D形状

ARTIC3D: Learning Robust Articulated 3D Shapes from Noisy Web Image Collections

摘要

Support