Publication:
CNN-based page segmentation and object classification for counting population in Ottoman archival documentation

dc.contributor.departmentDepartment of History
dc.contributor.kuauthorKabadayı, Mustafa Erdem
dc.contributor.kuauthorCan, Yekta Said
dc.contributor.kuprofileFaculty Member
dc.contributor.otherDepartment of History
dc.contributor.schoolcollegeinstituteCollege of Social Sciences and Humanities
dc.contributor.yokid33267
dc.contributor.yokidN/A
dc.date.accessioned2024-11-09T11:59:25Z
dc.date.issued2020
dc.description.abstractHistorical document analysis systems gain importance with the increasing efforts in the digitalization of archives. Page segmentation and layout analysis are crucial steps for such systems. Errors in these steps will affect the outcome of handwritten text recognition and Optical Character Recognition (OCR) methods, which increase the importance of the page segmentation and layout analysis. Degradation of documents, digitization errors, and varying layout styles are the issues that complicate the segmentation of historical documents. The properties of Arabic scripts such as connected letters, ligatures, diacritics, and different writing styles make it even more challenging to process Arabic script historical documents. In this study, we developed an automatic system for counting registered individuals and assigning them to populated places by using a CNN-based architecture. To evaluate the performance of our system, we created a labeled dataset of registers obtained from the first wave of population registers of the Ottoman Empire held between the 1840s and 1860s. We achieved promising results for classifying different types of objects and counting the individuals and assigning them to populated places.
dc.description.fulltextYES
dc.description.indexedbyWoS
dc.description.issue5
dc.description.openaccessYES
dc.description.publisherscopeInternational
dc.description.sponsoredbyTubitakEuEU
dc.description.sponsorshipEuropean Union (European Union)
dc.description.sponsorshipHorizon 2020
dc.description.sponsorshipEuropean Research Council (ERC)
dc.description.sponsorshipResearch and Innovation Program Grant
dc.description.sponsorshipUrbanOccupationsOETR
dc.description.versionPublisher version
dc.description.volume6
dc.formatpdf
dc.identifier.doi10.3390/jimaging6050032
dc.identifier.eissn2313-433X
dc.identifier.embargoNO
dc.identifier.filenameinventorynoIR02323
dc.identifier.linkhttps://doi.org/10.3390/jimaging6050032
dc.identifier.quartileN/A
dc.identifier.scopus2-s2.0-85089909335
dc.identifier.urihttps://hdl.handle.net/20.500.14288/922
dc.identifier.wos541022700002
dc.keywordsPage segmentation
dc.keywordsHistorical document analysis
dc.keywordsConvolutional neural networks
dc.keywordsArabic script layout analysis
dc.languageEnglish
dc.publisherMultidisciplinary Digital Publishing Institute (MDPI)
dc.relation.grantno679097
dc.relation.urihttp://cdm21054.contentdm.oclc.org/cdm/ref/collection/IR/id/8954
dc.sourceJournal of Imaging
dc.subjectHistory
dc.subjectImaging science and photographic technology
dc.titleCNN-based page segmentation and object classification for counting population in Ottoman archival documentation
dc.typeJournal Article
dspace.entity.typePublication
local.contributor.authorid0000-0003-3206-0190
local.contributor.authoridN/A
local.contributor.kuauthorKabadayı, Mustafa Erdem
local.contributor.kuauthorCan, Yekta Said
relation.isOrgUnitOfPublicationbe8432df-d124-44c3-85b4-be586c2db8a3
relation.isOrgUnitOfPublication.latestForDiscoverybe8432df-d124-44c3-85b4-be586c2db8a3

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
8954.pdf
Size:
6.06 MB
Format:
Adobe Portable Document Format