ChatPaper.aiChatPaper

动态场景下仅RGB监督的相机参数优化

RGB-Only Supervised Camera Parameter Optimization in Dynamic Scenes

September 18, 2025
作者: Fang Li, Hao Zhang, Narendra Ahuja
cs.AI

摘要

尽管COLMAP长期以来一直是静态场景中相机参数优化的主导方法,但其在动态场景中的应用受限于冗长的运行时间以及对真实运动掩码(GT)的依赖。许多研究尝试通过引入更多先验信息作为监督来改进它,例如真实焦距、运动掩码、3D点云、相机姿态和度量深度,然而这些信息在随手拍摄的RGB视频中通常难以获取。本文提出了一种仅通过单一RGB视频进行监督,在动态场景中实现更精确、更高效相机参数优化的新方法。该方法包含三个关键组成部分:(1) 基于片段的跟踪滤波器,用于在RGB视频中建立稳健且最大程度稀疏的铰链式关系;(2) 异常值感知联合优化,通过自适应降低移动异常值的权重,无需依赖运动先验,高效优化相机参数;(3) 两阶段优化策略,通过权衡Softplus限制与损失函数中的凸极小值,提升稳定性和优化速度。我们通过视觉和数值评估了相机估计结果。为进一步验证准确性,我们将相机估计结果输入到4D重建方法中,评估生成的3D场景以及渲染的2D RGB和深度图。我们在4个真实世界数据集(NeRF-DS、DAVIS、iPhone和TUM-dynamics)和1个合成数据集(MPI-Sintel)上进行了实验,结果表明,我们的方法仅以单一RGB视频作为监督,能够更高效、更准确地估计相机参数。
English
Although COLMAP has long remained the predominant method for camera parameter optimization in static scenes, it is constrained by its lengthy runtime and reliance on ground truth (GT) motion masks for application to dynamic scenes. Many efforts attempted to improve it by incorporating more priors as supervision such as GT focal length, motion masks, 3D point clouds, camera poses, and metric depth, which, however, are typically unavailable in casually captured RGB videos. In this paper, we propose a novel method for more accurate and efficient camera parameter optimization in dynamic scenes solely supervised by a single RGB video. Our method consists of three key components: (1) Patch-wise Tracking Filters, to establish robust and maximally sparse hinge-like relations across the RGB video. (2) Outlier-aware Joint Optimization, for efficient camera parameter optimization by adaptive down-weighting of moving outliers, without reliance on motion priors. (3) A Two-stage Optimization Strategy, to enhance stability and optimization speed by a trade-off between the Softplus limits and convex minima in losses. We visually and numerically evaluate our camera estimates. To further validate accuracy, we feed the camera estimates into a 4D reconstruction method and assess the resulting 3D scenes, and rendered 2D RGB and depth maps. We perform experiments on 4 real-world datasets (NeRF-DS, DAVIS, iPhone, and TUM-dynamics) and 1 synthetic dataset (MPI-Sintel), demonstrating that our method estimates camera parameters more efficiently and accurately with a single RGB video as the only supervision.
PDF52September 22, 2025