Spatially augmented speech bubble to character association via comic multi-task learning

Publication:
Spatially augmented speech bubble to character association via comic multi-task learning

dc.contributor.department	Department of Computer Engineering
dc.contributor.department	Graduate School of Sciences and Engineering
dc.contributor.kuauthor	Sezgin, Tevfik Metin
dc.contributor.kuauthor	Soykan, Gürkan
dc.contributor.kuauthor	Yüret, Deniz
dc.contributor.schoolcollegeinstitute	College of Engineering
dc.contributor.schoolcollegeinstitute	GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
dc.date.accessioned	2025-03-06T20:58:06Z
dc.date.issued	2024
dc.description.abstract	Accurately associating speech bubbles with corresponding characters is a challenging yet crucial task in comic book processing. This problem is gaining increased attention as it enhances the accessibility and analyzability of this rapidly growing medium. Current methods often struggle with the complex spatial relationships within comic panels, which lead to inconsistent associations. To address these short-comings, we developed a robust machine learning framework that leverages novel negative sampling methods, optimized pair-pool processes (the process of selecting speech bubble-character pairs during training) based on intra-panel spatial relationships, and an innovative masking strategy specifically designed for the relation branch of our model. Our approach builds upon and significantly enhances the COMIC MTL framework, improving its efficiency and accuracy in handling the unique challenges of comic book analysis. Finally, we conducted extensive experiments that demonstrate our model achieves state-of-the-art performance in linking characters to their speech bubbles. Moreover, through meticulous optimization of each component-from data preprocessing to neural network architecture-our method shows notable improvements in character face and body detection, as well as speech bubble segmentation.
dc.description.indexedby	Scopus
dc.description.publisherscope	International
dc.description.sponsoredbyTubitakEu	N/A
dc.identifier.doi	10.1007/978-3-031-70645-5_15
dc.identifier.eissn	1611-3349
dc.identifier.isbn	9783031706448
dc.identifier.isbn	9783031706455
dc.identifier.issn	0302-9743
dc.identifier.quartile	N/A
dc.identifier.scopus	2-s2.0-85204601637
dc.identifier.uri	https://doi.org/10.1007/978-3-031-70645-5_15
dc.identifier.uri	https://hdl.handle.net/20.500.14288/27361
dc.identifier.volume	14935
dc.identifier.wos	1336400200015
dc.keywords	Speech bubble association
dc.keywords	Speech bubble to character association
dc.keywords	Deep learning for comics
dc.keywords	Comic book analysis
dc.keywords	Multi-task learning
dc.language.iso	eng
dc.publisher	Springer International Publishing AG
dc.relation.ispartof	DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024 WORKSHOPS, PT I
dc.subject	Computer science
dc.title	Spatially augmented speech bubble to character association via comic multi-task learning
dc.type	Conference Proceeding
dspace.entity.type	Publication
local.publication.orgunit1	College of Engineering
local.publication.orgunit1	GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
local.publication.orgunit2	Department of Computer Engineering
local.publication.orgunit2	Graduate School of Sciences and Engineering
relation.isOrgUnitOfPublication	89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication	3fc31c89-e803-4eb1-af6b-6258bc42c3d8
relation.isOrgUnitOfPublication.latestForDiscovery	89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isParentOrgUnitOfPublication	8e756b23-2d4a-4ce8-b1b3-62c794a8c164
relation.isParentOrgUnitOfPublication	434c9663-2b11-4e66-9399-c863e2ebae43
relation.isParentOrgUnitOfPublication.latestForDiscovery	8e756b23-2d4a-4ce8-b1b3-62c794a8c164

Collections

Publications without Fulltext

Publication: Spatially augmented speech bubble to character association via comic multi-task learning

Files

Collections

Publication:
Spatially augmented speech bubble to character association via comic multi-task learning