ARTIC3D: 노이즈가 포함된 웹 이미지 컬렉션에서 강건한 관절형 3D 형태 학습

초록

단안 이미지에서 동물의 몸체와 같은 3D 관절형 구조를 추정하는 것은 카메라 시점, 자세, 질감, 조명 등의 모호성으로 인해 본질적으로 어려운 문제입니다. 본 연구에서는 야생 환경에서 희소한 이미지 컬렉션으로부터 개별 인스턴스의 3D 형태를 재구성하기 위한 자기 지도 학습 프레임워크인 ARTIC3D를 제안합니다. 구체적으로, ARTIC3D는 골격 기반의 표면 표현을 기반으로 하며, Stable Diffusion의 2D 확산 사전 정보를 추가적으로 활용합니다. 첫째, 2D 확산을 통해 가려짐/절단이 있는 입력 이미지를 개선하여 더 깔끔한 마스크 추정치와 의미론적 특징을 얻습니다. 둘째, 확산 기반 3D 최적화를 수행하여 입력 이미지에 충실하면서도 고해상도의 형태와 질감을 추정합니다. 또한, 기존 대안에 비해 확산 모델을 통해 더 안정적인 이미지 수준의 그래디언트를 계산하는 새로운 기법을 제안합니다. 마지막으로, 강체 부위 변환 하에서 렌더링된 형태와 질감을 미세 조정하여 현실적인 애니메이션을 생성합니다. 기존의 여러 데이터셋과 가려짐 및 절단이 있는 새로운 웹 이미지 컬렉션에 대한 광범위한 평가를 통해 ARTIC3D가 노이즈가 있는 이미지에 대해 더 강건하며, 형태와 질감 세부 사항에서 더 높은 품질을 보이고, 애니메이션 시 더 현실적인 결과를 출력함을 입증합니다. 프로젝트 페이지: https://chhankyao.github.io/artic3d/

English

Estimating 3D articulated shapes like animal bodies from monocular images is inherently challenging due to the ambiguities of camera viewpoint, pose, texture, lighting, etc. We propose ARTIC3D, a self-supervised framework to reconstruct per-instance 3D shapes from a sparse image collection in-the-wild. Specifically, ARTIC3D is built upon a skeleton-based surface representation and is further guided by 2D diffusion priors from Stable Diffusion. First, we enhance the input images with occlusions/truncation via 2D diffusion to obtain cleaner mask estimates and semantic features. Second, we perform diffusion-guided 3D optimization to estimate shape and texture that are of high-fidelity and faithful to input images. We also propose a novel technique to calculate more stable image-level gradients via diffusion models compared to existing alternatives. Finally, we produce realistic animations by fine-tuning the rendered shape and texture under rigid part transformations. Extensive evaluations on multiple existing datasets as well as newly introduced noisy web image collections with occlusions and truncation demonstrate that ARTIC3D outputs are more robust to noisy images, higher quality in terms of shape and texture details, and more realistic when animated. Project page: https://chhankyao.github.io/artic3d/

ARTIC3D: 노이즈가 포함된 웹 이미지 컬렉션에서 강건한 관절형 3D 형태 학습

ARTIC3D: Learning Robust Articulated 3D Shapes from Noisy Web Image Collections

초록

Support