Publication: A comprehensive gold standard and benchmark for comics text detection and recognition
Program
KU-Authors
Yüret, Deniz
Sezgin, Tevfik Metin
Soykan, Gürkan
KU Authors
Co-Authors
Advisor
Publication Date
Language
Journal Title
Journal ISSN
Volume Title
Abstract
This study focuses on improving the optical character recognition (OCR) data for panels in COMICS [18], the largest dataset containing text and images from comic books. To do this, we developed a pipeline for OCR processing and labeling of comic books and created the first text detection and recognition datasets for Western comics, called "COMICS Text+: Detection" and "COMICS Text+: Recognition". We evaluated the performance of fine-tuned state-of-the-art text detection and recognition models on these datasets and found significant improvement in word accuracy and normalized edit distance compared to the text in COMICS. We also created a new dataset called "COMICS Text+", which contains the extracted text from the textboxes in COMICS. Using the improved text data of COMICS Text+ in the comics processing model from COMICS resulted in state-of-the-art performance on cloze-style tasks without changing the model architecture. The COMICS Text+ can be a valuable resource for researchers working on tasks including text detection, recognition, and high-level processing of comics, such as narrative understanding, character relations, and story generation. All data, models, and instructions can be accessed online (https://github.com/gsoykan/comics_text_plus).
Source:
DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024 WORKSHOPS, PT I
Publisher:
Springer International Publishing AG
Keywords:
Subject
Computer science