WikiVideo:基於多部影片的文章生成
WikiVideo: Article Generation from Multiple Videos
April 1, 2025
作者: Alexander Martin, Reno Kriz, William Gantt Walden, Kate Sanders, Hannah Recknor, Eugene Yang, Francis Ferraro, Benjamin Van Durme
cs.AI
摘要
我們提出了一項具有挑戰性的任務:自動生成高層次的維基百科風格文章,這些文章需整合來自多個多樣化視頻的資訊,涵蓋自然災害或政治選舉等現實世界事件。視頻作為檢索增強生成(RAG)的直觀來源,但當代大多數RAG工作流程主要側重於文本,而現有的基於視頻的摘要方法則專注於低層次的場景理解而非高層次的事件語義。為彌補這一差距,我們引入了WikiVideo,這是一個由專家撰寫的文章和密集註釋的視頻組成的基準,這些視頻為文章的主張提供了證據,促進了視頻在RAG管道中的整合,並支持創建基於多模態來源的深入內容。我們進一步提出了協作文章生成(CAG),這是一種從多個視頻創建文章的創新互動方法。CAG利用r1風格推理模型與VideoLLM之間的迭代互動,來對目標事件進行比僅使用VideoLLM時更高層次的推斷,後者往往局限於低層次的視覺特徵。我們在oracle檢索和RAG設置下對最先進的VideoLLM和CAG進行了基準測試,發現CAG始終優於其他方法,同時為未來工作提出了引人入勝的研究方向。
English
We present the challenging task of automatically creating a high-level
Wikipedia-style article that aggregates information from multiple diverse
videos about real-world events, such as natural disasters or political
elections. Videos are intuitive sources for retrieval-augmented generation
(RAG), but most contemporary RAG workflows focus heavily on text and existing
methods for video-based summarization focus on low-level scene understanding
rather than high-level event semantics. To close this gap, we introduce
WikiVideo, a benchmark consisting of expert-written articles and densely
annotated videos that provide evidence for articles' claims, facilitating the
integration of video into RAG pipelines and enabling the creation of in-depth
content that is grounded in multimodal sources. We further propose
Collaborative Article Generation (CAG), a novel interactive method for article
creation from multiple videos. CAG leverages an iterative interaction between
an r1-style reasoning model and a VideoLLM to draw higher level inferences
about the target event than is possible with VideoLLMs alone, which fixate on
low-level visual features. We benchmark state-of-the-art VideoLLMs and CAG in
both oracle retrieval and RAG settings and find that CAG consistently
outperforms alternative methods, while suggesting intriguing avenues for future
work.Summary
AI-Generated Summary