Publication: ComicBERT: A transformer model and pre-training strategy for contextual understanding in comics
dc.contributor.department | Department of Computer Engineering | |
dc.contributor.department | Graduate School of Sciences and Engineering | |
dc.contributor.kuauthor | Sezgin, Tevfik Metin | |
dc.contributor.kuauthor | Soykan, Gürkan | |
dc.contributor.kuauthor | Yüret, Deniz | |
dc.contributor.schoolcollegeinstitute | College of Engineering | |
dc.contributor.schoolcollegeinstitute | GRADUATE SCHOOL OF SCIENCES AND ENGINEERING | |
dc.date.accessioned | 2025-03-06T20:58:06Z | |
dc.date.issued | 2024 | |
dc.description.abstract | Despite the growing interest in digital comic processing, foundational models tailored for this medium still need to be explored. Existing methods employ multimodal sequential models with cloze-style tasks, but they fall short of achieving human-like understanding. Addressing this gap, we introduce a novel transformer-based architecture, Comicsformer, and a comprehensive framework, ComicBERT, designed to process and understand the complex interplay of visual and textual elements in comics. Our approach utilizes a self-supervised objective, Masked Comic Modeling, inspired by BERT's [6] masked language modeling objective, to train the foundation model. To fine-tune and validate our models, we adopt existing cloze-style tasks and propose new tasks - such as scene-cloze, which better capture the narrative and contextual intricacies unique to comics. Preliminary experiments indicate that these tasks enhance the model's predictive accuracy and may provide new tools for comic creators, aiding in character dialogue generation and panel sequencing. Ultimately, ComicBERT aims to serve as a universal comic processor. | |
dc.description.indexedby | Scopus | |
dc.description.publisherscope | International | |
dc.description.sponsoredbyTubitakEu | N/A | |
dc.identifier.doi | 10.1007/978-3-031-70645-5_16 | |
dc.identifier.eissn | 1611-3349 | |
dc.identifier.isbn | 9783031706448 | |
dc.identifier.isbn | 9783031706455 | |
dc.identifier.issn | 0302-9743 | |
dc.identifier.quartile | N/A | |
dc.identifier.scopus | 2-s2.0-85204522420 | |
dc.identifier.uri | https://doi.org/10.1007/978-3-031-70645-5_16 | |
dc.identifier.uri | https://hdl.handle.net/20.500.14288/27360 | |
dc.identifier.volume | 14935 | |
dc.identifier.wos | 1336400200016 | |
dc.keywords | Digital comics processing | |
dc.keywords | Transformer architectures | |
dc.keywords | Self-supervised learning | |
dc.keywords | Cloze-style tasks | |
dc.keywords | Neural comic understanding | |
dc.language.iso | eng | |
dc.publisher | Springer International Publishing AG | |
dc.relation.ispartof | DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024 WORKSHOPS, PT I | |
dc.subject | Computer science | |
dc.title | ComicBERT: A transformer model and pre-training strategy for contextual understanding in comics | |
dc.type | Conference Proceeding | |
dspace.entity.type | Publication | |
local.publication.orgunit1 | College of Engineering | |
local.publication.orgunit1 | GRADUATE SCHOOL OF SCIENCES AND ENGINEERING | |
local.publication.orgunit2 | Department of Computer Engineering | |
local.publication.orgunit2 | Graduate School of Sciences and Engineering | |
relation.isOrgUnitOfPublication | 89352e43-bf09-4ef4-82f6-6f9d0174ebae | |
relation.isOrgUnitOfPublication | 3fc31c89-e803-4eb1-af6b-6258bc42c3d8 | |
relation.isOrgUnitOfPublication.latestForDiscovery | 89352e43-bf09-4ef4-82f6-6f9d0174ebae | |
relation.isParentOrgUnitOfPublication | 8e756b23-2d4a-4ce8-b1b3-62c794a8c164 | |
relation.isParentOrgUnitOfPublication | 434c9663-2b11-4e66-9399-c863e2ebae43 | |
relation.isParentOrgUnitOfPublication.latestForDiscovery | 8e756b23-2d4a-4ce8-b1b3-62c794a8c164 |