ChatPaper.aiChatPaper

飛馬-v1 技術報告

Pegasus-v1 Technical Report

April 23, 2024
作者: Raehyuk Jung, Hyojun Go, Jaehyuk Yi, Jiho Jang, Daniel Kim, Jay Suh, Aiden Lee, Cooper Han, Jae Lee, Jeff Kim, Jin-Young Kim, Junwan Kim, Kyle Park, Lucas Lee, Mars Ha, Minjoon Seo, Abraham Jo, Ed Park, Hassan Kianinejad, SJ Kim, Tony Moon, Wade Jeong, Andrei Popescu, Esther Kim, EK Yoon, Genie Heo, Henry Choi, Jenna Kang, Kevin Han, Noah Seo, Sunny Nguyen, Ryan Won, Yeonhoo Park, Anthony Giuliani, Dave Chung, Hans Yoon, James Le, Jenny Ahn, June Lee, Maninder Saini, Meredith Sanders, Soyoung Lee, Sue Kim, Travis Couture
cs.AI

摘要

本技術報告介紹了Pegasus-1,這是一個專注於視頻內容理解和通過自然語言進行互動的多模式語言模型。Pegasus-1的設計旨在應對視頻數據帶來的獨特挑戰,例如解釋時空信息,以提供跨不同長度的細緻視頻內容理解。本技術報告概述了Pegasus-1的架構、訓練策略,以及在視頻對話、零樣本視頻問答和視頻摘要等基準測試中的表現。我們還探討了Pegasus-1的定性特徵,展示其能力以及局限性,以便為讀者提供對其當前狀態和未來方向的平衡觀點。
English
This technical report introduces Pegasus-1, a multimodal language model specialized in video content understanding and interaction through natural language. Pegasus-1 is designed to address the unique challenges posed by video data, such as interpreting spatiotemporal information, to offer nuanced video content comprehension across various lengths. This technical report overviews Pegasus-1's architecture, training strategies, and its performance in benchmarks on video conversation, zero-shot video question answering, and video summarization. We also explore qualitative characteristics of Pegasus-1 , demonstrating its capabilities as well as its limitations, in order to provide readers a balanced view of its current state and its future direction.

Summary

AI-Generated Summary

PDF332December 15, 2024