ChatPaper.aiChatPaper

迈向通用视频检索:通过合成多模态金字塔课程泛化视频嵌入

Towards Universal Video Retrieval: Generalizing Video Embedding via Synthesized Multimodal Pyramid Curriculum

October 31, 2025
作者: Zhuoning Guo, Mingxin Li, Yanzhao Zhang, Dingkun Long, Pengjun Xie, Xiaowen Chu
cs.AI

摘要

当前视频检索的主流范式存在结构性偏差,窄化基准催生了相应局限的数据与单任务训练模式。由于缺乏能够定义并要求多维度泛化能力的诊断性评估,通用性能因此受到抑制。为打破这一循环,我们提出评估、数据与建模协同设计的框架。首先建立通用视频检索基准(UVRB),该基准集包含16个数据集,不仅能衡量性能,更能诊断跨任务与跨领域的关键能力缺口。其次,基于UVRB的诊断结果,我们设计可扩展的合成流程,生成155万高质量样本对以填充通用性所需的语义空间。最后提出模态金字塔训练课程,通过显式利用多元数据间的潜在关联,训练出通用视频嵌入模型(GVE)。大量实验表明GVE在UVRB上实现了零样本泛化的最优性能。特别值得注意的是,分析揭示流行基准对通用能力的预测性较差,且部分相关检索是主导却长期被忽视的场景。总体而言,我们的协同设计框架为突破现有局限、迈向真正通用的视频检索提供了可行路径。
English
The prevailing video retrieval paradigm is structurally misaligned, as narrow benchmarks incentivize correspondingly limited data and single-task training. Therefore, universal capability is suppressed due to the absence of a diagnostic evaluation that defines and demands multi-dimensional generalization. To break this cycle, we introduce a framework built on the co-design of evaluation, data, and modeling. First, we establish the Universal Video Retrieval Benchmark (UVRB), a suite of 16 datasets designed not only to measure performance but also to diagnose critical capability gaps across tasks and domains. Second, guided by UVRB's diagnostics, we introduce a scalable synthesis workflow that generates 1.55 million high-quality pairs to populate the semantic space required for universality. Finally, we devise the Modality Pyramid, a curriculum that trains our General Video Embedder (GVE) by explicitly leveraging the latent interconnections within our diverse data. Extensive experiments show GVE achieves state-of-the-art zero-shot generalization on UVRB. In particular, our analysis reveals that popular benchmarks are poor predictors of general ability and that partially relevant retrieval is a dominant but overlooked scenario. Overall, our co-designed framework provides a practical path to escape the limited scope and advance toward truly universal video retrieval.
PDF171January 19, 2026