從影片中重建可動畫類別

摘要

建立可動畫的3D模型具有挑戰性，因為需要進行3D掃描、繁瑣的註冊和手動設置骨骼，這些難以擴展到任意類別。最近，可微渲染提供了從單眼視頻中獲得高質量3D模型的途徑，但這些僅限於剛性類別或單個實例。我們提出了RAC，從單眼視頻中構建類別3D模型，同時將實例變化和時間運動分離。為解決此問題，引入了三個關鍵思想：（1）通過優化將骨架專門化到實例，（2）一種潛在空間正則化方法，鼓勵跨類別保持共享結構，同時保留實例細節，以及（3）使用3D背景模型將對象與背景分離。我們展示可以從50-100個互聯網視頻中學習人類、貓和狗的3D模型。

English

Building animatable 3D models is challenging due to the need for 3D scans, laborious registration, and manual rigging, which are difficult to scale to arbitrary categories. Recently, differentiable rendering provides a pathway to obtain high-quality 3D models from monocular videos, but these are limited to rigid categories or single instances. We present RAC that builds category 3D models from monocular videos while disentangling variations over instances and motion over time. Three key ideas are introduced to solve this problem: (1) specializing a skeleton to instances via optimization, (2) a method for latent space regularization that encourages shared structure across a category while maintaining instance details, and (3) using 3D background models to disentangle objects from the background. We show that 3D models of humans, cats, and dogs can be learned from 50-100 internet videos.

從影片中重建可動畫類別

Reconstructing Animatable Categories from Videos

摘要

Support