ChatPaper.aiChatPaper

朝向在大型影片庫上的檢索增強生成

Towards Retrieval Augmented Generation over Large Video Libraries

June 21, 2024
作者: Yannis Tevissen, Khalil Guetari, Frédéric Petitpont
cs.AI

摘要

視頻內容創作者需要高效的工具來重新利用內容,這項任務通常需要進行複雜的手動或自動搜索。從龐大的視頻庫中製作新視頻仍然是一項挑戰。本文介紹了通過一個可互操作的架構,應用檢索增強生成(RAG)到視頻庫中,引入了視頻庫問答(VLQA)任務。我們提出了一個系統,使用大型語言模型(LLMs)來生成搜索查詢,檢索由語音和視覺元數據索引的相關視頻片段。然後,一個答案生成模塊將用戶查詢與這些元數據整合,以生成具有特定視頻時間戳的回應。這種方法在多媒體內容檢索和AI輔助視頻內容創作方面顯示了潛力。
English
Video content creators need efficient tools to repurpose content, a task that often requires complex manual or automated searches. Crafting a new video from large video libraries remains a challenge. In this paper we introduce the task of Video Library Question Answering (VLQA) through an interoperable architecture that applies Retrieval Augmented Generation (RAG) to video libraries. We propose a system that uses large language models (LLMs) to generate search queries, retrieving relevant video moments indexed by speech and visual metadata. An answer generation module then integrates user queries with this metadata to produce responses with specific video timestamps. This approach shows promise in multimedia content retrieval, and AI-assisted video content creation.

Summary

AI-Generated Summary

PDF211November 29, 2024