Rilevamento Online dei Confini Generici degli Eventi

Abstract

La Rilevazione Generica dei Confini degli Eventi (GEBD) mira a interpretare video di lunga durata attraverso la lente della percezione umana. Tuttavia, i metodi attuali di GEBD richiedono l'elaborazione completa dei fotogrammi video per fare previsioni, a differenza degli esseri umani che elaborano i dati in modo online e in tempo reale. Per colmare questa lacuna, introduciamo un nuovo compito, la Rilevazione Online dei Confini Generici degli Eventi (On-GEBD), che mira a rilevare immediatamente i confini degli eventi generici nei video in streaming. Questo compito affronta sfide uniche nell'identificare cambiamenti di eventi sottili e privi di tassonomia in tempo reale, senza accesso ai fotogrammi futuri. Per affrontare queste sfide, proponiamo un nuovo framework On-GEBD, chiamato Estimator, ispirato alla Teoria della Segmentazione degli Eventi (EST), che spiega come gli esseri umani segmentano le attività in corso in eventi sfruttando le discrepanze tra le informazioni previste e quelle effettive. Il nostro framework è composto da due componenti chiave: l'Anticipatore di Eventi Consistenti (CEA) e il Discriminatore Online dei Confini (OBD). Nello specifico, il CEA genera una previsione del fotogramma futuro che riflette le dinamiche dell'evento corrente basandosi esclusivamente sui fotogrammi precedenti. Successivamente, l'OBD misura l'errore di previsione e regola in modo adattivo la soglia utilizzando test statistici sugli errori passati per catturare transizioni di eventi diverse e sottili. I risultati sperimentali dimostrano che Estimator supera tutti i modelli di riferimento adattati da recenti modelli di comprensione video online e raggiunge prestazioni comparabili ai metodi offline-GEBD precedenti sui dataset Kinetics-GEBD e TAPOS.

English

Generic Event Boundary Detection (GEBD) aims to interpret long-form videos through the lens of human perception. However, current GEBD methods require processing complete video frames to make predictions, unlike humans processing data online and in real-time. To bridge this gap, we introduce a new task, Online Generic Event Boundary Detection (On-GEBD), aiming to detect boundaries of generic events immediately in streaming videos. This task faces unique challenges of identifying subtle, taxonomy-free event changes in real-time, without the access to future frames. To tackle these challenges, we propose a novel On-GEBD framework, Estimator, inspired by Event Segmentation Theory (EST) which explains how humans segment ongoing activity into events by leveraging the discrepancies between predicted and actual information. Our framework consists of two key components: the Consistent Event Anticipator (CEA), and the Online Boundary Discriminator (OBD). Specifically, the CEA generates a prediction of the future frame reflecting current event dynamics based solely on prior frames. Then, the OBD measures the prediction error and adaptively adjusts the threshold using statistical tests on past errors to capture diverse, subtle event transitions. Experimental results demonstrate that Estimator outperforms all baselines adapted from recent online video understanding models and achieves performance comparable to prior offline-GEBD methods on the Kinetics-GEBD and TAPOS datasets.