Computer Science > Computer Vision and Pattern Recognition
[Submitted on 18 Jan 2024]The Manga Whisperer: Automatically Generating Transcriptions for Comics
Ragav Sachdeva, Andrew ZissermanIn the past few decades, Japanese comics, commonly referred to as Manga, have transcended both cultural and linguistic boundaries to become a true worldwide sensation. Yet, the inherent reliance on visual cues and illustration within manga renders it largely inaccessible to individuals with visual impairments. In this work, we seek to address this substantial barrier, with the aim of ensuring that manga can be appreciated and actively engaged by everyone. Specifically, we tackle the problem of diarisation i.e. generating a transcription of who said what and when, in a fully automatic way.
To this end, we make the following contributions: (1) we present a unified model, Magi, that is able to (a) detect panels, text boxes and character boxes, (b) cluster characters by identity (without knowing the number of clusters apriori), and (c) associate dialogues to their speakers; (2) we propose a novel approach that is able to sort the detected text boxes in their reading order and generate a dialogue transcript; (3) we annotate an evaluation benchmark for this task using publicly available [English] manga pages. The code, evaluation datasets and the pre-trained model can be found at: this https URL.
Subjects: | Computer Vision and Pattern Recognition (cs.CV) |
Cite as: | arXiv:2401.10224 [cs.CV] |
(or arXiv:2401.10224v1 [cs.CV] for this version) |
Submission history
From: Ragav Sachdeva [view email][v1] Thu, 18 Jan 2024 18:59:09 UTC (34,898 KB)
GitHub - ragavsachdeva/magi: Generate a transcript for your favourite Manga: Detect manga characters, text blocks and panels. Order panels. Cluster characters. Match texts to their speakers. Perform OCR.
Generate a transcript for your favourite Manga: Detect manga characters, text blocks and panels. Order panels. Cluster characters. Match texts to their speakers. Perform OCR. - GitHub - ragavsachd...
github.com
About
Generate a transcript for your favourite Manga: Detect manga characters, text blocks and panels. Order panels. Cluster characters. Match texts to their speakers. Perform OCR.The Manga Whisperer: Automatically Generating Transcriptions for Comics
[arXiv]Ragav Sachdeva, Andrew Zisserman
TLDR
- The model is available at HuggingFace Model Hub.
- Try it out for yourself using this HuggingFace Spaces Demo (no GPU, so slow).
- Dataset is coming soon.
- Basic model usage is provided below, more details to follow.