视觉同步:基于跨视角物体运动的多相机同步技术
Visual Sync: Multi-Camera Synchronization via Cross-View Object Motion
December 1, 2025
作者: Shaowei Liu, David Yifan Yao, Saurabh Gupta, Shenlong Wang
cs.AI
摘要
如今人们能够轻松使用多种消费级相机记录音乐会、体育赛事、讲座、家庭聚会和生日派对等难忘时刻。然而,跨相机视频流的同步始终是技术难点。现有方法通常依赖于受控环境、特定目标、人工校正或昂贵硬件。我们提出VisualSync——一种基于多视角动态学的优化框架,能以毫秒级精度对齐无固定机位、未同步的视频。我们的核心发现是:任何在双视角中可见的移动三维点,在正确同步后都应满足极几何约束。为此,VisualSync利用现成的三维重建、特征匹配与密集追踪技术来提取运动轨迹、相对位姿和跨视角对应关系,继而通过联合最小化极线误差来估算各相机的时间偏移。在四个多样化高难度数据集上的实验表明,VisualSync优于基线方法,实现了中位数同步误差低于50毫秒的精度。
English
Today, people can easily record memorable moments, ranging from concerts, sports events, lectures, family gatherings, and birthday parties with multiple consumer cameras. However, synchronizing these cross-camera streams remains challenging. Existing methods assume controlled settings, specific targets, manual correction, or costly hardware. We present VisualSync, an optimization framework based on multi-view dynamics that aligns unposed, unsynchronized videos at millisecond accuracy. Our key insight is that any moving 3D point, when co-visible in two cameras, obeys epipolar constraints once properly synchronized. To exploit this, VisualSync leverages off-the-shelf 3D reconstruction, feature matching, and dense tracking to extract tracklets, relative poses, and cross-view correspondences. It then jointly minimizes the epipolar error to estimate each camera's time offset. Experiments on four diverse, challenging datasets show that VisualSync outperforms baseline methods, achieving an median synchronization error below 50 ms.