Publication: A comprehensive gold standard and benchmark for comics text detection and recognition
Program
KU Authors
Co-Authors
Publication Date
Language
Embargo Status
Journal Title
Journal ISSN
Volume Title
Alternative Title
Abstract
This study focuses on improving the optical character recognition (OCR) data for panels in COMICS [18], the largest dataset containing text and images from comic books. To do this, we developed a pipeline for OCR processing and labeling of comic books and created the first text detection and recognition datasets for Western comics, called "COMICS Text+: Detection" and "COMICS Text+: Recognition". We evaluated the performance of fine-tuned state-of-the-art text detection and recognition models on these datasets and found significant improvement in word accuracy and normalized edit distance compared to the text in COMICS. We also created a new dataset called "COMICS Text+", which contains the extracted text from the textboxes in COMICS. Using the improved text data of COMICS Text+ in the comics processing model from COMICS resulted in state-of-the-art performance on cloze-style tasks without changing the model architecture. The COMICS Text+ can be a valuable resource for researchers working on tasks including text detection, recognition, and high-level processing of comics, such as narrative understanding, character relations, and story generation. All data, models, and instructions can be accessed online (https://github.com/gsoykan/comics_text_plus).
Source
Publisher
Springer International Publishing AG
Subject
Computer science
Citation
Has Part
Source
DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024 WORKSHOPS, PT I
Book Series Title
Edition
DOI
10.1007/978-3-031-70645-5_12