面向检索增强生成的大型视频库
Towards Retrieval Augmented Generation over Large Video Libraries
June 21, 2024
作者: Yannis Tevissen, Khalil Guetari, Frédéric Petitpont
cs.AI
摘要
视频内容创作者需要高效的工具来重新利用内容,这通常需要复杂的手动或自动搜索。从大型视频库中制作新视频仍然是一个挑战。在本文中,我们介绍了视频库问答(VLQA)任务,通过一个可互操作的架构,将检索增强生成(RAG)应用于视频库。我们提出了一个系统,利用大型语言模型(LLMs)生成搜索查询,检索由语音和视觉元数据索引的相关视频片段。然后,一个答案生成模块将用户查询与这些元数据集成,生成带有特定视频时间戳的响应。这种方法在多媒体内容检索和AI辅助视频内容创作方面显示出潜力。
English
Video content creators need efficient tools to repurpose content, a task that
often requires complex manual or automated searches. Crafting a new video from
large video libraries remains a challenge. In this paper we introduce the task
of Video Library Question Answering (VLQA) through an interoperable
architecture that applies Retrieval Augmented Generation (RAG) to video
libraries. We propose a system that uses large language models (LLMs) to
generate search queries, retrieving relevant video moments indexed by speech
and visual metadata. An answer generation module then integrates user queries
with this metadata to produce responses with specific video timestamps. This
approach shows promise in multimedia content retrieval, and AI-assisted video
content creation.Summary
AI-Generated Summary