Publication:
ComicBERT: A transformer model and pre-training strategy for contextual understanding in comics

dc.contributor.departmentDepartment of Computer Engineering
dc.contributor.departmentGraduate School of Sciences and Engineering
dc.contributor.kuauthorSezgin, Tevfik Metin
dc.contributor.kuauthorSoykan, Gürkan
dc.contributor.kuauthorYüret, Deniz
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.contributor.schoolcollegeinstituteGRADUATE SCHOOL OF SCIENCES AND ENGINEERING
dc.date.accessioned2025-03-06T20:58:06Z
dc.date.issued2024
dc.description.abstractDespite the growing interest in digital comic processing, foundational models tailored for this medium still need to be explored. Existing methods employ multimodal sequential models with cloze-style tasks, but they fall short of achieving human-like understanding. Addressing this gap, we introduce a novel transformer-based architecture, Comicsformer, and a comprehensive framework, ComicBERT, designed to process and understand the complex interplay of visual and textual elements in comics. Our approach utilizes a self-supervised objective, Masked Comic Modeling, inspired by BERT's [6] masked language modeling objective, to train the foundation model. To fine-tune and validate our models, we adopt existing cloze-style tasks and propose new tasks - such as scene-cloze, which better capture the narrative and contextual intricacies unique to comics. Preliminary experiments indicate that these tasks enhance the model's predictive accuracy and may provide new tools for comic creators, aiding in character dialogue generation and panel sequencing. Ultimately, ComicBERT aims to serve as a universal comic processor.
dc.description.indexedbyScopus
dc.description.publisherscopeInternational
dc.description.sponsoredbyTubitakEuN/A
dc.identifier.doi10.1007/978-3-031-70645-5_16
dc.identifier.eissn1611-3349
dc.identifier.isbn9783031706448
dc.identifier.isbn9783031706455
dc.identifier.issn0302-9743
dc.identifier.quartileN/A
dc.identifier.scopus2-s2.0-85204522420
dc.identifier.urihttps://doi.org/10.1007/978-3-031-70645-5_16
dc.identifier.urihttps://hdl.handle.net/20.500.14288/27360
dc.identifier.volume14935
dc.identifier.wos1336400200016
dc.keywordsDigital comics processing
dc.keywordsTransformer architectures
dc.keywordsSelf-supervised learning
dc.keywordsCloze-style tasks
dc.keywordsNeural comic understanding
dc.language.isoeng
dc.publisherSpringer International Publishing AG
dc.relation.ispartofDOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024 WORKSHOPS, PT I
dc.subjectComputer science
dc.titleComicBERT: A transformer model and pre-training strategy for contextual understanding in comics
dc.typeConference Proceeding
dspace.entity.typePublication
local.publication.orgunit1College of Engineering
local.publication.orgunit1GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
local.publication.orgunit2Department of Computer Engineering
local.publication.orgunit2Graduate School of Sciences and Engineering
relation.isOrgUnitOfPublication89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication3fc31c89-e803-4eb1-af6b-6258bc42c3d8
relation.isOrgUnitOfPublication.latestForDiscovery89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isParentOrgUnitOfPublication8e756b23-2d4a-4ce8-b1b3-62c794a8c164
relation.isParentOrgUnitOfPublication434c9663-2b11-4e66-9399-c863e2ebae43
relation.isParentOrgUnitOfPublication.latestForDiscovery8e756b23-2d4a-4ce8-b1b3-62c794a8c164

Files