ChatPaper.aiChatPaper

视觉同步:通过跨视角物体运动实现多摄像头协同

Visual Sync: Multi-Camera Synchronization via Cross-View Object Motion

December 1, 2025
作者: Shaowei Liu, David Yifan Yao, Saurabh Gupta, Shenlong Wang
cs.AI

摘要

如今,人们能够轻松使用多种消费级摄像机记录音乐会、体育赛事、讲座、家庭聚会和生日派对等难忘时刻。然而,这些跨摄像机视频流的同步仍具挑战性。现有方法通常依赖受控环境、特定目标、人工校正或昂贵硬件。我们提出VisualSync——一种基于多视角动态学的优化框架,能以毫秒级精度对齐无固定机位、未同步的视频。我们的核心发现是:任何在双摄像机中共视的移动三维点,在正确同步后必然满足极几何约束。基于此,VisualSync利用现成的三维重建、特征匹配与密集追踪技术来提取轨迹片段、相对位姿及跨视角对应关系,继而通过联合最小化极线误差来估算各摄像机的时间偏移。在四个多样化高难度数据集上的实验表明,VisualSync优于基线方法,实现了中位数同步误差低于50毫秒的精度。
English
Today, people can easily record memorable moments, ranging from concerts, sports events, lectures, family gatherings, and birthday parties with multiple consumer cameras. However, synchronizing these cross-camera streams remains challenging. Existing methods assume controlled settings, specific targets, manual correction, or costly hardware. We present VisualSync, an optimization framework based on multi-view dynamics that aligns unposed, unsynchronized videos at millisecond accuracy. Our key insight is that any moving 3D point, when co-visible in two cameras, obeys epipolar constraints once properly synchronized. To exploit this, VisualSync leverages off-the-shelf 3D reconstruction, feature matching, and dense tracking to extract tracklets, relative poses, and cross-view correspondences. It then jointly minimizes the epipolar error to estimate each camera's time offset. Experiments on four diverse, challenging datasets show that VisualSync outperforms baseline methods, achieving an median synchronization error below 50 ms.
PDF11December 4, 2025