ChatPaper.aiChatPaper

飞马-v1 技术报告

Pegasus-v1 Technical Report

April 23, 2024
作者: Raehyuk Jung, Hyojun Go, Jaehyuk Yi, Jiho Jang, Daniel Kim, Jay Suh, Aiden Lee, Cooper Han, Jae Lee, Jeff Kim, Jin-Young Kim, Junwan Kim, Kyle Park, Lucas Lee, Mars Ha, Minjoon Seo, Abraham Jo, Ed Park, Hassan Kianinejad, SJ Kim, Tony Moon, Wade Jeong, Andrei Popescu, Esther Kim, EK Yoon, Genie Heo, Henry Choi, Jenna Kang, Kevin Han, Noah Seo, Sunny Nguyen, Ryan Won, Yeonhoo Park, Anthony Giuliani, Dave Chung, Hans Yoon, James Le, Jenny Ahn, June Lee, Maninder Saini, Meredith Sanders, Soyoung Lee, Sue Kim, Travis Couture
cs.AI

摘要

本技术报告介绍了Pegasus-1,这是一种专门用于视频内容理解和通过自然语言进行交互的多模态语言模型。Pegasus-1的设计旨在解决视频数据带来的独特挑战,例如解释时空信息,以提供对各种长度的视频内容的细致理解。本技术报告概述了Pegasus-1的架构、训练策略以及在视频对话、零样本视频问答和视频摘要等基准测试中的性能。我们还探讨了Pegasus-1的定性特征,展示其能力以及局限性,以便为读者提供关于其当前状态和未来方向的平衡观点。
English
This technical report introduces Pegasus-1, a multimodal language model specialized in video content understanding and interaction through natural language. Pegasus-1 is designed to address the unique challenges posed by video data, such as interpreting spatiotemporal information, to offer nuanced video content comprehension across various lengths. This technical report overviews Pegasus-1's architecture, training strategies, and its performance in benchmarks on video conversation, zero-shot video question answering, and video summarization. We also explore qualitative characteristics of Pegasus-1 , demonstrating its capabilities as well as its limitations, in order to provide readers a balanced view of its current state and its future direction.

Summary

AI-Generated Summary

PDF332December 15, 2024