在线通用事件边界检测
Online Generic Event Boundary Detection
October 8, 2025
作者: Hyungrok Jung, Daneul Kim, Seunggyun Lim, Jeany Son, Jonghyun Choi
cs.AI
摘要
通用事件边界检测(GEBD)旨在通过人类感知的视角解析长视频。然而,当前的GEBD方法需要处理完整的视频帧才能做出预测,这与人类在线实时处理数据的方式不同。为弥合这一差距,我们引入了一项新任务——在线通用事件边界检测(On-GEBD),旨在即时检测流媒体视频中的通用事件边界。该任务面临独特挑战,需在无法访问未来帧的情况下,实时识别无分类体系的微妙事件变化。为应对这些挑战,我们提出了一种新颖的On-GEBD框架——Estimator,其灵感来源于事件分割理论(EST),该理论解释了人类如何通过利用预测信息与实际信息之间的差异,将进行中的活动分割为事件。我们的框架包含两个关键组件:一致性事件预测器(CEA)和在线边界判别器(OBD)。具体而言,CEA仅基于先前帧生成反映当前事件动态的未来帧预测。随后,OBD测量预测误差,并通过对过去误差的统计测试自适应调整阈值,以捕捉多样且微妙的事件转换。实验结果表明,Estimator在Kinetics-GEBD和TAPOS数据集上不仅超越了所有基于近期在线视频理解模型改编的基线,而且达到了与先前离线GEBD方法相当的性能。
English
Generic Event Boundary Detection (GEBD) aims to interpret long-form videos
through the lens of human perception. However, current GEBD methods require
processing complete video frames to make predictions, unlike humans processing
data online and in real-time. To bridge this gap, we introduce a new task,
Online Generic Event Boundary Detection (On-GEBD), aiming to detect boundaries
of generic events immediately in streaming videos. This task faces unique
challenges of identifying subtle, taxonomy-free event changes in real-time,
without the access to future frames. To tackle these challenges, we propose a
novel On-GEBD framework, Estimator, inspired by Event Segmentation Theory (EST)
which explains how humans segment ongoing activity into events by leveraging
the discrepancies between predicted and actual information. Our framework
consists of two key components: the Consistent Event Anticipator (CEA), and the
Online Boundary Discriminator (OBD). Specifically, the CEA generates a
prediction of the future frame reflecting current event dynamics based solely
on prior frames. Then, the OBD measures the prediction error and adaptively
adjusts the threshold using statistical tests on past errors to capture
diverse, subtle event transitions. Experimental results demonstrate that
Estimator outperforms all baselines adapted from recent online video
understanding models and achieves performance comparable to prior offline-GEBD
methods on the Kinetics-GEBD and TAPOS datasets.