Publication: A comprehensive gold standard and benchmark for comics text detection and recognition
dc.contributor.department | Department of Computer Engineering | |
dc.contributor.department | Graduate School of Sciences and Engineering | |
dc.contributor.kuauthor | Sezgin, Tevfik Metin | |
dc.contributor.kuauthor | Soykan, Gürkan | |
dc.contributor.kuauthor | Yüret, Deniz | |
dc.contributor.schoolcollegeinstitute | College of Engineering | |
dc.contributor.schoolcollegeinstitute | GRADUATE SCHOOL OF SCIENCES AND ENGINEERING | |
dc.date.accessioned | 2025-03-06T20:58:05Z | |
dc.date.issued | 2024 | |
dc.description.abstract | This study focuses on improving the optical character recognition (OCR) data for panels in COMICS [18], the largest dataset containing text and images from comic books. To do this, we developed a pipeline for OCR processing and labeling of comic books and created the first text detection and recognition datasets for Western comics, called "COMICS Text+: Detection" and "COMICS Text+: Recognition". We evaluated the performance of fine-tuned state-of-the-art text detection and recognition models on these datasets and found significant improvement in word accuracy and normalized edit distance compared to the text in COMICS. We also created a new dataset called "COMICS Text+", which contains the extracted text from the textboxes in COMICS. Using the improved text data of COMICS Text+ in the comics processing model from COMICS resulted in state-of-the-art performance on cloze-style tasks without changing the model architecture. The COMICS Text+ can be a valuable resource for researchers working on tasks including text detection, recognition, and high-level processing of comics, such as narrative understanding, character relations, and story generation. All data, models, and instructions can be accessed online (https://github.com/gsoykan/comics_text_plus). | |
dc.description.indexedby | Scopus | |
dc.description.publisherscope | International | |
dc.description.sponsoredbyTubitakEu | N/A | |
dc.description.sponsorship | This project is supported by KocUniversity and. Is Bank AI Center (KUIS AI). We would like to thank KUIS AI for their support. | |
dc.identifier.doi | 10.1007/978-3-031-70645-5_12 | |
dc.identifier.eissn | 1611-3349 | |
dc.identifier.grantno | KocUniversity | |
dc.identifier.isbn | 9783031706448 | |
dc.identifier.isbn | 9783031706455 | |
dc.identifier.issn | 0302-9743 | |
dc.identifier.quartile | N/A | |
dc.identifier.scopus | 2-s2.0-85204597509 | |
dc.identifier.uri | https://doi.org/10.1007/978-3-031-70645-5_12 | |
dc.identifier.uri | https://hdl.handle.net/20.500.14288/27359 | |
dc.identifier.volume | 14935 | |
dc.identifier.wos | 1336400200012 | |
dc.keywords | Optical character recognition (OCR) | |
dc.keywords | Text detection | |
dc.keywords | Text recognition | |
dc.keywords | Comic processing | |
dc.language.iso | eng | |
dc.publisher | Springer International Publishing AG | |
dc.relation.ispartof | DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024 WORKSHOPS, PT I | |
dc.subject | Computer science | |
dc.title | A comprehensive gold standard and benchmark for comics text detection and recognition | |
dc.type | Conference Proceeding | |
dspace.entity.type | Publication | |
local.publication.orgunit1 | College of Engineering | |
local.publication.orgunit1 | GRADUATE SCHOOL OF SCIENCES AND ENGINEERING | |
local.publication.orgunit2 | Department of Computer Engineering | |
local.publication.orgunit2 | Graduate School of Sciences and Engineering | |
relation.isOrgUnitOfPublication | 89352e43-bf09-4ef4-82f6-6f9d0174ebae | |
relation.isOrgUnitOfPublication | 3fc31c89-e803-4eb1-af6b-6258bc42c3d8 | |
relation.isOrgUnitOfPublication.latestForDiscovery | 89352e43-bf09-4ef4-82f6-6f9d0174ebae | |
relation.isParentOrgUnitOfPublication | 8e756b23-2d4a-4ce8-b1b3-62c794a8c164 | |
relation.isParentOrgUnitOfPublication | 434c9663-2b11-4e66-9399-c863e2ebae43 | |
relation.isParentOrgUnitOfPublication.latestForDiscovery | 8e756b23-2d4a-4ce8-b1b3-62c794a8c164 |