Publication: Spatially augmented speech bubble to character association via comic multi-task learning
dc.contributor.department | Department of Computer Engineering | |
dc.contributor.department | Graduate School of Sciences and Engineering | |
dc.contributor.kuauthor | Sezgin, Tevfik Metin | |
dc.contributor.kuauthor | Soykan, Gürkan | |
dc.contributor.kuauthor | Yüret, Deniz | |
dc.contributor.schoolcollegeinstitute | College of Engineering | |
dc.contributor.schoolcollegeinstitute | GRADUATE SCHOOL OF SCIENCES AND ENGINEERING | |
dc.date.accessioned | 2025-03-06T20:58:06Z | |
dc.date.issued | 2024 | |
dc.description.abstract | Accurately associating speech bubbles with corresponding characters is a challenging yet crucial task in comic book processing. This problem is gaining increased attention as it enhances the accessibility and analyzability of this rapidly growing medium. Current methods often struggle with the complex spatial relationships within comic panels, which lead to inconsistent associations. To address these short-comings, we developed a robust machine learning framework that leverages novel negative sampling methods, optimized pair-pool processes (the process of selecting speech bubble-character pairs during training) based on intra-panel spatial relationships, and an innovative masking strategy specifically designed for the relation branch of our model. Our approach builds upon and significantly enhances the COMIC MTL framework, improving its efficiency and accuracy in handling the unique challenges of comic book analysis. Finally, we conducted extensive experiments that demonstrate our model achieves state-of-the-art performance in linking characters to their speech bubbles. Moreover, through meticulous optimization of each component-from data preprocessing to neural network architecture-our method shows notable improvements in character face and body detection, as well as speech bubble segmentation. | |
dc.description.indexedby | Scopus | |
dc.description.publisherscope | International | |
dc.description.sponsoredbyTubitakEu | N/A | |
dc.identifier.doi | 10.1007/978-3-031-70645-5_15 | |
dc.identifier.eissn | 1611-3349 | |
dc.identifier.isbn | 9783031706448 | |
dc.identifier.isbn | 9783031706455 | |
dc.identifier.issn | 0302-9743 | |
dc.identifier.quartile | N/A | |
dc.identifier.scopus | 2-s2.0-85204601637 | |
dc.identifier.uri | https://doi.org/10.1007/978-3-031-70645-5_15 | |
dc.identifier.uri | https://hdl.handle.net/20.500.14288/27361 | |
dc.identifier.volume | 14935 | |
dc.identifier.wos | 1336400200015 | |
dc.keywords | Speech bubble association | |
dc.keywords | Speech bubble to character association | |
dc.keywords | Deep learning for comics | |
dc.keywords | Comic book analysis | |
dc.keywords | Multi-task learning | |
dc.language.iso | eng | |
dc.publisher | Springer International Publishing AG | |
dc.relation.ispartof | DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024 WORKSHOPS, PT I | |
dc.subject | Computer science | |
dc.title | Spatially augmented speech bubble to character association via comic multi-task learning | |
dc.type | Conference Proceeding | |
dspace.entity.type | Publication | |
local.publication.orgunit1 | College of Engineering | |
local.publication.orgunit1 | GRADUATE SCHOOL OF SCIENCES AND ENGINEERING | |
local.publication.orgunit2 | Department of Computer Engineering | |
local.publication.orgunit2 | Graduate School of Sciences and Engineering | |
relation.isOrgUnitOfPublication | 89352e43-bf09-4ef4-82f6-6f9d0174ebae | |
relation.isOrgUnitOfPublication | 3fc31c89-e803-4eb1-af6b-6258bc42c3d8 | |
relation.isOrgUnitOfPublication.latestForDiscovery | 89352e43-bf09-4ef4-82f6-6f9d0174ebae | |
relation.isParentOrgUnitOfPublication | 8e756b23-2d4a-4ce8-b1b3-62c794a8c164 | |
relation.isParentOrgUnitOfPublication | 434c9663-2b11-4e66-9399-c863e2ebae43 | |
relation.isParentOrgUnitOfPublication.latestForDiscovery | 8e756b23-2d4a-4ce8-b1b3-62c794a8c164 |