Find the Leak, Fix the Split: Cluster-Based Method to Prevent Leakage in Video-Derived Datasets
November 17, 2025
Authors: Noam Glazner, Noam Tsfaty, Sharon Shalev, Avishai Weizman
cs.AI
Abstract
We propose a cluster-based frame selection strategy to mitigate information leakage in video-derived frames datasets. By grouping visually similar frames before splitting into training, validation, and test sets, the method produces more representative, balanced, and reliable dataset partitions.