OmniShotCut:基于镜头查询变换器的整体关系型镜头边界检测
OmniShotCut: Holistic Relational Shot Boundary Detection with Shot-Query Transformer
April 27, 2026
作者: Boyang Wang, Guangyi Xu, Zhipeng Tang, Jiahui Zhang, Zezhou Cheng
cs.AI
摘要
镜头边界检测(SBD)旨在自动识别镜头切换并将视频分割为连贯的镜头单元。尽管该领域已有广泛研究,但现有先进方法常存在三大局限:生成的转场边界缺乏可解释性、易忽略细微但影响观感的断续点、且依赖噪声大多样性低的标注数据与过时评测基准。为缓解这些问题,我们提出OmniShotCut框架,将SBD重构为结构化关系预测问题,通过基于镜头查询的稠密视频Transformer联合估计镜头范围及其内部关系与跨镜头关系。为避免人工标注不精确,我们采用全合成转场生成管线,自动复现主要转场类型并生成带精确边界与参数化变体的数据。同时推出现代化广域评测基准OmniShotCutBench,支持全景化与诊断式评估。
English
Shot Boundary Detection (SBD) aims to automatically identify shot changes and divide a video into coherent shots. While SBD was widely studied in the literature, existing state-of-the-art methods often produce non-interpretable boundaries on transitions, miss subtle yet harmful discontinuities, and rely on noisy, low-diversity annotations and outdated benchmarks. To alleviate these limitations, we propose OmniShotCut to formulate SBD as structured relational prediction, jointly estimating shot ranges with intra-shot relations and inter-shot relations, by a shot query-based dense video Transformer. To avoid imprecise manual labeling, we adopt a fully synthetic transition synthesis pipeline that automatically reproduces major transition families with precise boundaries and parameterized variants. We also introduce OmniShotCutBench, a modern wide-domain benchmark enabling holistic and diagnostic evaluation.