ChatPaper.aiChatPaper

尾巴述说故事:带有角色名称的章节级漫画转录

Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names

August 1, 2024
作者: Ragav Sachdeva, Gyungin Shin, Andrew Zisserman
cs.AI

摘要

让视障人士参与漫画阅读面临重大挑战,因为漫画本质上是一种视觉形式。为促进可访问性,本文旨在自动完整生成一整个漫画章节的对话转录,特别注重确保叙事一致性。这包括识别(i)对话内容,即检测每一页上的文本并将其分类为必要与非必要,以及(ii)对话发出者,即将每段对话归属给其说话者,同时确保整个章节中角色名称一致。 为此,我们引入了:(i)Magiv2,一个能够生成高质量整章漫画转录的模型,具有命名角色和在说话者分离方面比以往作品具有更高精度的特点;(ii)PopManga评估数据集的扩展,现在包括对话气泡尾框的标注,将文本与相应尾框关联,将文本分类为必要或非必要,并为每个角色框标识身份;以及(iii)一个新的角色库数据集,包括来自76部漫画系列的超过11K个角色,总共包含11.5K个示例角色图像,以及它们出现的章节列表。代码、训练模型和这两个数据集可在以下网址找到:https://github.com/ragavsachdeva/magi
English
Enabling engagement of manga by visually impaired individuals presents a significant challenge due to its inherently visual nature. With the goal of fostering accessibility, this paper aims to generate a dialogue transcript of a complete manga chapter, entirely automatically, with a particular emphasis on ensuring narrative consistency. This entails identifying (i) what is being said, i.e., detecting the texts on each page and classifying them into essential vs non-essential, and (ii) who is saying it, i.e., attributing each dialogue to its speaker, while ensuring the same characters are named consistently throughout the chapter. To this end, we introduce: (i) Magiv2, a model that is capable of generating high-quality chapter-wide manga transcripts with named characters and significantly higher precision in speaker diarisation over prior works; (ii) an extension of the PopManga evaluation dataset, which now includes annotations for speech-bubble tail boxes, associations of text to corresponding tails, classifications of text as essential or non-essential, and the identity for each character box; and (iii) a new character bank dataset, which comprises over 11K characters from 76 manga series, featuring 11.5K exemplar character images in total, as well as a list of chapters in which they appear. The code, trained model, and both datasets can be found at: https://github.com/ragavsachdeva/magi

Summary

AI-Generated Summary

PDF112November 28, 2024