ARTIC3D：ノイズの多いウェブ画像コレクションからロバストな関節付き3D形状を学習する

要旨

単眼画像から動物の体などの3D関節形状を推定することは、カメラの視点、ポーズ、テクスチャ、照明などの曖昧さにより、本質的に困難な課題です。本論文では、野外で撮影された疎な画像コレクションからインスタンスごとの3D形状を再構築するための自己教師ありフレームワーク、ARTIC3Dを提案します。具体的には、ARTIC3Dはスケルトンベースの表面表現を基盤としており、Stable Diffusionからの2D拡散事前分布によってさらに導かれます。まず、2D拡散を用いてオクルージョンや切り取りを伴う入力画像を強化し、よりクリーンなマスク推定とセマンティック特徴を取得します。次に、拡散誘導型の3D最適化を実行し、高忠実度で入力画像に忠実な形状とテクスチャを推定します。また、既存の手法と比較して、拡散モデルを用いてより安定した画像レベルの勾配を計算する新たな技術を提案します。最後に、剛体部分変換の下でレンダリングされた形状とテクスチャを微調整することで、リアルなアニメーションを生成します。複数の既存データセットおよび新たに導入されたオクルージョンや切り取りを伴うノイジーなウェブ画像コレクションに対する広範な評価により、ARTIC3Dの出力がノイジーな画像に対してよりロバストであり、形状とテクスチャの詳細において高品質であり、アニメーション時によりリアルであることが実証されました。プロジェクトページ: https://chhankyao.github.io/artic3d/

English

Estimating 3D articulated shapes like animal bodies from monocular images is inherently challenging due to the ambiguities of camera viewpoint, pose, texture, lighting, etc. We propose ARTIC3D, a self-supervised framework to reconstruct per-instance 3D shapes from a sparse image collection in-the-wild. Specifically, ARTIC3D is built upon a skeleton-based surface representation and is further guided by 2D diffusion priors from Stable Diffusion. First, we enhance the input images with occlusions/truncation via 2D diffusion to obtain cleaner mask estimates and semantic features. Second, we perform diffusion-guided 3D optimization to estimate shape and texture that are of high-fidelity and faithful to input images. We also propose a novel technique to calculate more stable image-level gradients via diffusion models compared to existing alternatives. Finally, we produce realistic animations by fine-tuning the rendered shape and texture under rigid part transformations. Extensive evaluations on multiple existing datasets as well as newly introduced noisy web image collections with occlusions and truncation demonstrate that ARTIC3D outputs are more robust to noisy images, higher quality in terms of shape and texture details, and more realistic when animated. Project page: https://chhankyao.github.io/artic3d/

ARTIC3D：ノイズの多いウェブ画像コレクションからロバストな関節付き3D形状を学習する

ARTIC3D: Learning Robust Articulated 3D Shapes from Noisy Web Image Collections

要旨

Support