尾巴述說故事:包含角色名稱的章節範圍漫畫轉錄
Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names
August 1, 2024
作者: Ragav Sachdeva, Gyungin Shin, Andrew Zisserman
cs.AI
摘要
讓視障人士參與漫畫閱讀面臨重大挑戰,因為漫畫本質上是視覺性的。本文旨在促進可及性,旨在自動完整生成一個漫畫章節的對話文本,特別強調確保敘事一致性。這包括識別(i)說了什麼,即在每頁上檢測文本並將其分類為必要與非必要,以及(ii)誰說的,即將每段對話歸因於其說話者,同時確保整個章節中角色名稱一致。
為此,我們介紹:(i)Magiv2,一個能夠生成高質量整章漫畫對話文本的模型,具有命名角色和在語者分離方面比以往作品具有顯著更高精度的能力;(ii)PopManga評估數據集的擴展,現在包括對話框尾巴框的標註,文本與相應尾巴的關聯,文本的分類為必要或非必要,以及每個角色框的身份;以及(iii)一個新的角色庫數據集,包括來自76部漫畫系列的超過11,000個角色,總共包含11,500個範例角色圖像,以及它們出現的章節清單。代碼、訓練模型和這兩個數據集可在以下鏈接找到:https://github.com/ragavsachdeva/magi
English
Enabling engagement of manga by visually impaired individuals presents a
significant challenge due to its inherently visual nature. With the goal of
fostering accessibility, this paper aims to generate a dialogue transcript of a
complete manga chapter, entirely automatically, with a particular emphasis on
ensuring narrative consistency. This entails identifying (i) what is being
said, i.e., detecting the texts on each page and classifying them into
essential vs non-essential, and (ii) who is saying it, i.e., attributing each
dialogue to its speaker, while ensuring the same characters are named
consistently throughout the chapter.
To this end, we introduce: (i) Magiv2, a model that is capable of generating
high-quality chapter-wide manga transcripts with named characters and
significantly higher precision in speaker diarisation over prior works; (ii) an
extension of the PopManga evaluation dataset, which now includes annotations
for speech-bubble tail boxes, associations of text to corresponding tails,
classifications of text as essential or non-essential, and the identity for
each character box; and (iii) a new character bank dataset, which comprises
over 11K characters from 76 manga series, featuring 11.5K exemplar character
images in total, as well as a list of chapters in which they appear. The code,
trained model, and both datasets can be found at:
https://github.com/ragavsachdeva/magiSummary
AI-Generated Summary