尾巴述说故事：带有角色名称的章节级漫画转录

摘要

让视障人士参与漫画阅读面临重大挑战，因为漫画本质上是一种视觉形式。为促进可访问性，本文旨在自动完整生成一整个漫画章节的对话转录，特别注重确保叙事一致性。这包括识别（i）对话内容，即检测每一页上的文本并将其分类为必要与非必要，以及（ii）对话发出者，即将每段对话归属给其说话者，同时确保整个章节中角色名称一致。为此，我们引入了：（i）Magiv2，一个能够生成高质量整章漫画转录的模型，具有命名角色和在说话者分离方面比以往作品具有更高精度的特点；（ii）PopManga评估数据集的扩展，现在包括对话气泡尾框的标注，将文本与相应尾框关联，将文本分类为必要或非必要，并为每个角色框标识身份；以及（iii）一个新的角色库数据集，包括来自76部漫画系列的超过11K个角色，总共包含11.5K个示例角色图像，以及它们出现的章节列表。代码、训练模型和这两个数据集可在以下网址找到：https://github.com/ragavsachdeva/magi

English

Enabling engagement of manga by visually impaired individuals presents a significant challenge due to its inherently visual nature. With the goal of fostering accessibility, this paper aims to generate a dialogue transcript of a complete manga chapter, entirely automatically, with a particular emphasis on ensuring narrative consistency. This entails identifying (i) what is being said, i.e., detecting the texts on each page and classifying them into essential vs non-essential, and (ii) who is saying it, i.e., attributing each dialogue to its speaker, while ensuring the same characters are named consistently throughout the chapter. To this end, we introduce: (i) Magiv2, a model that is capable of generating high-quality chapter-wide manga transcripts with named characters and significantly higher precision in speaker diarisation over prior works; (ii) an extension of the PopManga evaluation dataset, which now includes annotations for speech-bubble tail boxes, associations of text to corresponding tails, classifications of text as essential or non-essential, and the identity for each character box; and (iii) a new character bank dataset, which comprises over 11K characters from 76 manga series, featuring 11.5K exemplar character images in total, as well as a list of chapters in which they appear. The code, trained model, and both datasets can be found at: https://github.com/ragavsachdeva/magi

尾巴述说故事：带有角色名称的章节级漫画转录

Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names

摘要

Support