Dr.V:基於細粒度時空定位的層次化感知-時序-認知框架用於視頻幻覺診斷
Dr.V: A Hierarchical Perception-Temporal-Cognition Framework to Diagnose Video Hallucination by Fine-grained Spatial-Temporal Grounding
September 15, 2025
作者: Meng Luo, Shengqiong Wu, Liqiang Jing, Tianjie Ju, Li Zheng, Jinxiang Lai, Tianlong Wu, Xinya Du, Jian Li, Siyuan Yan, Jiebo Luo, William Yang Wang, Hao Fei, Mong-Li Lee, Wynne Hsu
cs.AI
摘要
近期大型视频模型(LVMs)的显著进展极大地提升了视频理解能力。然而,这些模型仍存在幻觉问题,生成的内容与输入视频相矛盾。为解决这一问题,我们提出了Dr.V,一个涵盖感知、时间和认知层面的层次化框架,通过细粒度的时空定位来诊断视频幻觉。Dr.V由两个关键组件构成:基准数据集Dr.V-Bench和卫星视频代理Dr.V-Agent。Dr.V-Bench包含从4,974个视频中抽取的10,000个实例,覆盖多种任务,每个实例均配有详细的时空标注。Dr.V-Agent通过在感知和时间层面系统性地应用细粒度时空定位,随后进行认知层面的推理,来检测LVMs中的幻觉。这一逐步的流程模拟了人类对视频的理解方式,有效识别了幻觉。大量实验表明,Dr.V-Agent在诊断幻觉的同时,增强了可解释性和可靠性,为现实场景中的稳健视频理解提供了实用蓝图。我们的所有数据和代码均可在https://github.com/Eurekaleo/Dr.V获取。
English
Recent advancements in large video models (LVMs) have significantly enhance
video understanding. However, these models continue to suffer from
hallucinations, producing content that conflicts with input videos. To address
this issue, we propose Dr.V, a hierarchical framework covering perceptive,
temporal, and cognitive levels to diagnose video hallucination by fine-grained
spatial-temporal grounding. Dr.V comprises of two key components: a benchmark
dataset Dr.V-Bench and a satellite video agent Dr.V-Agent. Dr.V-Bench includes
10k instances drawn from 4,974 videos spanning diverse tasks, each enriched
with detailed spatial-temporal annotation. Dr.V-Agent detects hallucinations in
LVMs by systematically applying fine-grained spatial-temporal grounding at the
perceptive and temporal levels, followed by cognitive level reasoning. This
step-by-step pipeline mirrors human-like video comprehension and effectively
identifies hallucinations. Extensive experiments demonstrate that Dr.V-Agent is
effective in diagnosing hallucination while enhancing interpretability and
reliability, offering a practical blueprint for robust video understanding in
real-world scenarios. All our data and code are available at
https://github.com/Eurekaleo/Dr.V.