비디오에서 애니메이션 가능한 카테고리 재구성하기

초록

애니메이션 가능한 3D 모델을 구축하는 것은 3D 스캔, 수작업이 필요한 정합(registration), 그리고 수동 리깅(rigging)이 필요하기 때문에 어려운 작업이며, 이를 임의의 카테고리로 확장하기는 더욱 어렵습니다. 최근에는 미분 가능 렌더링(differentiable rendering)을 통해 단안 영상(monocular video)에서 고품질 3D 모델을 얻는 방법이 제시되었지만, 이는 단일 인스턴스나 고정된 카테고리에 한정됩니다. 우리는 RAC를 제안하며, 이는 단안 영상으로부터 카테고리별 3D 모델을 구축하면서 인스턴스 간의 변이와 시간에 따른 움직임을 분리합니다. 이 문제를 해결하기 위해 세 가지 핵심 아이디어를 도입했습니다: (1) 최적화를 통해 인스턴스별로 스켈레톤을 특수화하는 방법, (2) 카테고리 전체에서 공유 구조를 장려하면서도 인스턴스별 세부 사항을 유지하는 잠재 공간 정규화(latent space regularization) 방법, (3) 3D 배경 모델을 사용하여 객체를 배경과 분리하는 방법. 우리는 인간, 고양이, 개의 3D 모델을 50-100개의 인터넷 영상으로부터 학습할 수 있음을 보여줍니다.

English

Building animatable 3D models is challenging due to the need for 3D scans, laborious registration, and manual rigging, which are difficult to scale to arbitrary categories. Recently, differentiable rendering provides a pathway to obtain high-quality 3D models from monocular videos, but these are limited to rigid categories or single instances. We present RAC that builds category 3D models from monocular videos while disentangling variations over instances and motion over time. Three key ideas are introduced to solve this problem: (1) specializing a skeleton to instances via optimization, (2) a method for latent space regularization that encourages shared structure across a category while maintaining instance details, and (3) using 3D background models to disentangle objects from the background. We show that 3D models of humans, cats, and dogs can be learned from 50-100 internet videos.

비디오에서 애니메이션 가능한 카테고리 재구성하기

Reconstructing Animatable Categories from Videos

초록

Support