ChatPaper.aiChatPaper

电影摘要:电影剧本的抽象摘要数据集

MovieSum: An Abstractive Summarization Dataset for Movie Screenplays

August 12, 2024
作者: Rohit Saxena, Frank Keller
cs.AI

摘要

电影剧本摘要是具有挑战性的,因为它需要理解长输入上下文和电影独特的各种元素。大型语言模型在文档摘要方面取得了显著进展,但它们通常难以处理长输入上下文。此外,尽管电视剧本已经引起了近期研究的关注,但电影剧本摘要仍未得到充分探索。为了激励这一领域的研究,我们提出了一个新的数据集 MovieSum,用于电影剧本的抽象摘要。该数据集包括 2200 部电影剧本及其维基百科情节摘要。我们手动格式化了电影剧本以表示它们的结构元素。与现有数据集相比,MovieSum 具有几个独特特征:(1) 它包括电影剧本,比电视剧集的剧本更长。(2) 它是先前电影剧本数据集的两倍大小。(3) 它提供了带有 IMDb ID 的元数据,以便访问额外的外部知识。我们还展示了最近发布的大型语言模型在我们的数据集上应用于摘要的结果,以提供详细的基线。
English
Movie screenplay summarization is challenging, as it requires an understanding of long input contexts and various elements unique to movies. Large language models have shown significant advancements in document summarization, but they often struggle with processing long input contexts. Furthermore, while television transcripts have received attention in recent studies, movie screenplay summarization remains underexplored. To stimulate research in this area, we present a new dataset, MovieSum, for abstractive summarization of movie screenplays. This dataset comprises 2200 movie screenplays accompanied by their Wikipedia plot summaries. We manually formatted the movie screenplays to represent their structural elements. Compared to existing datasets, MovieSum possesses several distinctive features: (1) It includes movie screenplays, which are longer than scripts of TV episodes. (2) It is twice the size of previous movie screenplay datasets. (3) It provides metadata with IMDb IDs to facilitate access to additional external knowledge. We also show the results of recently released large language models applied to summarization on our dataset to provide a detailed baseline.

Summary

AI-Generated Summary

PDF92November 28, 2024