Publication:
Spatially augmented speech bubble to character association via comic multi-task learning

dc.conference.dateAUG 30-SEP 04, 2024
dc.conference.locationAthens, Greece
dc.conference.organizer18th International Conference on Document Analysis and Recognition (ICDAR)
dc.contributor.departmentDepartment of Computer Engineering
dc.contributor.departmentKUIS AI (Koç University & İş Bank Artificial Intelligence Center)
dc.contributor.facultymemberYes
dc.contributor.kuauthorSezgin, Tevfik Metin
dc.contributor.kuauthorSoykan, Gürkan
dc.contributor.kuauthorYüret, Deniz
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.contributor.schoolcollegeinstituteResearch Center
dc.date.accessioned2025-03-06T20:58:06Z
dc.date.issued2024
dc.description.abstractAccurately associating speech bubbles with corresponding characters is a challenging yet crucial task in comic book processing. This problem is gaining increased attention as it enhances the accessibility and analyzability of this rapidly growing medium. Current methods often struggle with the complex spatial relationships within comic panels, which lead to inconsistent associations. To address these short-comings, we developed a robust machine learning framework that leverages novel negative sampling methods, optimized pair-pool processes (the process of selecting speech bubble-character pairs during training) based on intra-panel spatial relationships, and an innovative masking strategy specifically designed for the relation branch of our model. Our approach builds upon and significantly enhances the COMIC MTL framework, improving its efficiency and accuracy in handling the unique challenges of comic book analysis. Finally, we conducted extensive experiments that demonstrate our model achieves state-of-the-art performance in linking characters to their speech bubbles. Moreover, through meticulous optimization of each component-from data preprocessing to neural network architecture-our method shows notable improvements in character face and body detection, as well as speech bubble segmentation.
dc.description.fulltextNo
dc.description.harvestedfromManual
dc.description.indexedbyWOS
dc.description.indexedbyScopus
dc.description.openaccessN/A
dc.description.peerreviewstatusN/A
dc.description.publisherscopeInternational
dc.description.readpublishN/A
dc.description.sponsoredbyTubitakEuN/A
dc.description.studentonlypublicationNo
dc.description.studentpublicationYes
dc.description.versionN/A
dc.identifier.doi10.1007/978-3-031-70645-5_15
dc.identifier.eissn1611-3349
dc.identifier.embargoN/A
dc.identifier.isbn9783031706448
dc.identifier.isbn9783031706455
dc.identifier.issn0302-9743
dc.identifier.quartileQ4
dc.identifier.scopus2-s2.0-85204601637
dc.identifier.urihttps://doi.org/10.1007/978-3-031-70645-5_15
dc.identifier.urihttps://hdl.handle.net/20.500.14288/27361
dc.identifier.volume14935
dc.identifier.wos001336400200015
dc.keywordsSpeech bubble association
dc.keywordsSpeech bubble to character association
dc.keywordsDeep learning for comics
dc.keywordsComic book analysis
dc.keywordsMulti-task learning
dc.language.isoeng
dc.publisherSpringer Nature
dc.relation.affiliationKoç University
dc.relation.collectionKoç University Institutional Repository
dc.relation.ispartofDocument Analysis And Recognition -ICDAR 2024 Workshops, PT I
dc.relation.openaccessN/A
dc.rightsN/A
dc.subjectComputer science
dc.titleSpatially augmented speech bubble to character association via comic multi-task learning
dc.typeConference Proceeding
dspace.entity.typePublication
relation.isOrgUnitOfPublication89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication77d67233-829b-4c3a-a28f-bd97ab5c12c7
relation.isOrgUnitOfPublication.latestForDiscovery89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isParentOrgUnitOfPublication8e756b23-2d4a-4ce8-b1b3-62c794a8c164
relation.isParentOrgUnitOfPublicationd437580f-9309-4ecb-864a-4af58309d287
relation.isParentOrgUnitOfPublication.latestForDiscovery8e756b23-2d4a-4ce8-b1b3-62c794a8c164

Files