尾巴述说故事:带有角色名称的章节级漫画转录
Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names
August 1, 2024
作者: Ragav Sachdeva, Gyungin Shin, Andrew Zisserman
cs.AI
摘要
让视障人士参与漫画阅读面临重大挑战,因为漫画本质上是一种视觉形式。为促进可访问性,本文旨在自动完整生成一整个漫画章节的对话转录,特别注重确保叙事一致性。这包括识别(i)对话内容,即检测每一页上的文本并将其分类为必要与非必要,以及(ii)对话发出者,即将每段对话归属给其说话者,同时确保整个章节中角色名称一致。
为此,我们引入了:(i)Magiv2,一个能够生成高质量整章漫画转录的模型,具有命名角色和在说话者分离方面比以往作品具有更高精度的特点;(ii)PopManga评估数据集的扩展,现在包括对话气泡尾框的标注,将文本与相应尾框关联,将文本分类为必要或非必要,并为每个角色框标识身份;以及(iii)一个新的角色库数据集,包括来自76部漫画系列的超过11K个角色,总共包含11.5K个示例角色图像,以及它们出现的章节列表。代码、训练模型和这两个数据集可在以下网址找到:https://github.com/ragavsachdeva/magi
English
Enabling engagement of manga by visually impaired individuals presents a
significant challenge due to its inherently visual nature. With the goal of
fostering accessibility, this paper aims to generate a dialogue transcript of a
complete manga chapter, entirely automatically, with a particular emphasis on
ensuring narrative consistency. This entails identifying (i) what is being
said, i.e., detecting the texts on each page and classifying them into
essential vs non-essential, and (ii) who is saying it, i.e., attributing each
dialogue to its speaker, while ensuring the same characters are named
consistently throughout the chapter.
To this end, we introduce: (i) Magiv2, a model that is capable of generating
high-quality chapter-wide manga transcripts with named characters and
significantly higher precision in speaker diarisation over prior works; (ii) an
extension of the PopManga evaluation dataset, which now includes annotations
for speech-bubble tail boxes, associations of text to corresponding tails,
classifications of text as essential or non-essential, and the identity for
each character box; and (iii) a new character bank dataset, which comprises
over 11K characters from 76 manga series, featuring 11.5K exemplar character
images in total, as well as a list of chapters in which they appear. The code,
trained model, and both datasets can be found at:
https://github.com/ragavsachdeva/magiSummary
AI-Generated Summary