动态相机姿态及其定位方法

摘要

大规模标注动态网络视频中的相机姿态对于推动逼真视频生成与仿真等领域的发展至关重要。然而，收集此类数据集颇具挑战，因为大多数网络视频并不适合进行姿态估计。此外，即便是最先进的方法，为动态网络视频标注也面临显著困难。本文中，我们介绍了DynPose-100K，这是一个大规模动态网络视频数据集，其中标注了相机姿态。我们的采集流程通过精心结合任务专用模型与通用模型来解决筛选问题。在姿态估计方面，我们融合了点跟踪、动态掩码以及运动结构恢复等最新技术，实现了对现有最先进方法的改进。我们的分析与实验表明，DynPose-100K不仅在规模上庞大，而且在多个关键属性上展现出多样性，为各类下游应用的进步开辟了新途径。

English

Annotating camera poses on dynamic Internet videos at scale is critical for advancing fields like realistic video generation and simulation. However, collecting such a dataset is difficult, as most Internet videos are unsuitable for pose estimation. Furthermore, annotating dynamic Internet videos present significant challenges even for state-of-theart methods. In this paper, we introduce DynPose-100K, a large-scale dataset of dynamic Internet videos annotated with camera poses. Our collection pipeline addresses filtering using a carefully combined set of task-specific and generalist models. For pose estimation, we combine the latest techniques of point tracking, dynamic masking, and structure-from-motion to achieve improvements over the state-of-the-art approaches. Our analysis and experiments demonstrate that DynPose-100K is both large-scale and diverse across several key attributes, opening up avenues for advancements in various downstream applications.

动态相机姿态及其定位方法

Dynamic Camera Poses and Where to Find Them

摘要

Support