从视频中重建可动画类别

摘要

由于需要进行3D扫描、繁琐的配准和手动绑定，构建可动画的3D模型具有挑战性，这些过程难以扩展到任意类别。最近，可微渲染提供了一条途径，可以从单眼视频中获得高质量的3D模型，但这些模型仅限于刚性类别或单个实例。我们提出了RAC，它可以从单眼视频中构建类别3D模型，同时分离实例间的变化和随时间的运动。为解决这一问题，引入了三个关键思想：（1）通过优化将骨架专门化到实例，（2）一种潜在空间正则化方法，鼓励跨类别共享结构，同时保留实例细节，以及（3）使用3D背景模型将物体与背景分离。我们展示了可以从50-100个互联网视频中学习人类、猫和狗的3D模型。

English

Building animatable 3D models is challenging due to the need for 3D scans, laborious registration, and manual rigging, which are difficult to scale to arbitrary categories. Recently, differentiable rendering provides a pathway to obtain high-quality 3D models from monocular videos, but these are limited to rigid categories or single instances. We present RAC that builds category 3D models from monocular videos while disentangling variations over instances and motion over time. Three key ideas are introduced to solve this problem: (1) specializing a skeleton to instances via optimization, (2) a method for latent space regularization that encourages shared structure across a category while maintaining instance details, and (3) using 3D background models to disentangle objects from the background. We show that 3D models of humans, cats, and dogs can be learned from 50-100 internet videos.

从视频中重建可动画类别

Reconstructing Animatable Categories from Videos

摘要

Support