ARTIC3D：從嘈雜的網絡圖像集中學習魯棒的關節式3D形狀

摘要

從單眼圖像中估計像動物身體這樣的3D關節形狀，由於相機視角、姿勢、紋理、光線等的模糊性，本質上是具有挑戰性的。我們提出了ARTIC3D，一個自監督框架，從野外的稀疏圖像集合中重建每個實例的3D形狀。具體而言，ARTIC3D建立在基於骨架的表面表示之上，並進一步受到來自穩定擴散的2D擴散先驗的引導。首先，我們通過2D擴散對輸入圖像進行遮擋/截斷來增強，以獲得更清晰的遮罩估計和語義特徵。其次，我們執行引導擴散的3D優化，以估計形狀和紋理，這些形狀和紋理具有高保真度並忠實於輸入圖像。我們還提出了一種新技術，通過擴散模型計算更穩定的圖像級梯度，相較於現有的替代方案。最後，通過在剛性部分變換下微調渲染的形狀和紋理，我們生成逼真的動畫。對多個現有數據集以及新引入的帶有遮擋和截斷的嘈雜網絡圖像集合進行了廣泛評估，結果顯示ARTIC3D的輸出對於嘈雜圖像更具魯棒性，在形狀和紋理細節方面具有更高的質量，並在動畫時更加逼真。項目頁面：https://chhankyao.github.io/artic3d/

English

Estimating 3D articulated shapes like animal bodies from monocular images is inherently challenging due to the ambiguities of camera viewpoint, pose, texture, lighting, etc. We propose ARTIC3D, a self-supervised framework to reconstruct per-instance 3D shapes from a sparse image collection in-the-wild. Specifically, ARTIC3D is built upon a skeleton-based surface representation and is further guided by 2D diffusion priors from Stable Diffusion. First, we enhance the input images with occlusions/truncation via 2D diffusion to obtain cleaner mask estimates and semantic features. Second, we perform diffusion-guided 3D optimization to estimate shape and texture that are of high-fidelity and faithful to input images. We also propose a novel technique to calculate more stable image-level gradients via diffusion models compared to existing alternatives. Finally, we produce realistic animations by fine-tuning the rendered shape and texture under rigid part transformations. Extensive evaluations on multiple existing datasets as well as newly introduced noisy web image collections with occlusions and truncation demonstrate that ARTIC3D outputs are more robust to noisy images, higher quality in terms of shape and texture details, and more realistic when animated. Project page: https://chhankyao.github.io/artic3d/

ARTIC3D：從嘈雜的網絡圖像集中學習魯棒的關節式3D形狀

ARTIC3D: Learning Robust Articulated 3D Shapes from Noisy Web Image Collections

摘要

Support