Publication:
Automatic CNN-based Arabic numeral spotting and handwritten digit recognition by using deep transfer learning in Ottoman population registers

dc.contributor.departmentDepartment of History
dc.contributor.kuauthorCan, Yekta Said
dc.contributor.kuauthorKabadayı, Mustafa Erdem
dc.contributor.schoolcollegeinstituteCollege of Social Sciences and Humanities
dc.date.accessioned2024-11-09T23:04:46Z
dc.date.issued2020
dc.description.abstractHistorical manuscripts and archival documentation are handwritten texts which are the backbone sources for historical inquiry. Recent developments in the digital humanities field and the need for extracting information from the historical documents have fastened the digitization processes. Cutting edge machine learning methods are applied to extract meaning from these documents. Page segmentation (layout analysis), keyword, number and symbol spotting, handwritten text recognition algorithms are tested on historical documents. For most of the languages, these techniques are widely studied and high performance techniques are developed. However, the properties of Arabic scripts (i.e., diacritics, varying script styles, diacritics, and ligatures) create additional problems for these algorithms and, therefore, the number of research is limited. In this research, we first automatically spotted the Arabic numerals from the very first series of population registers of the Ottoman Empire conducted in the mid-nineteenth century and recognized these numbers. They are important because they held information about the number of households, registered individuals and ages of individuals. We applied a red color filter to separate numerals from the document by taking advantage of the structure of the studied registers (numerals are written in red). We first used a CNN-based segmentation method for spotting these numerals. In the second part, we annotated a local Arabic handwritten digit dataset from the spotted numerals by selecting uni-digit ones and tested the Deep Transfer Learning method from large open Arabic handwritten digit datasets for digit recognition. We achieved promising results for recognizing digits in these historical documents.
dc.description.indexedbyWOS
dc.description.issue16
dc.description.openaccessYES
dc.description.publisherscopeInternational
dc.description.sponsoredbyTubitakEuN/A
dc.description.sponsorshipEuropean Research Council (ERC) project: "Industrialisation and Urban Growth from the mid-nineteenth century Ottoman Empire to Contemporary Turkey in a Comparative Perspective, 1850-2000" under the European Union's Horizon 2020 research and innovation prog [679097] This work was supported by the European Research Council (ERC) project: "Industrialisation and Urban Growth from the mid-nineteenth century Ottoman Empire to Contemporary Turkey in a Comparative Perspective, 1850-2000" under the European Union's Horizon 2020 research and innovation program Grant Agreement No. 679097, acronym UrbanOccupationsOETR. M. Erdem Kabadayi is the principal investigator of UrbanOccupationsOETR.
dc.description.volume10
dc.identifier.doi10.3390/app10165430
dc.identifier.eissn2076-3417
dc.identifier.quartileQ2
dc.identifier.scopus2-s2.0-85089886263
dc.identifier.urihttps://doi.org/10.3390/app10165430
dc.identifier.urihttps://hdl.handle.net/20.500.14288/8678
dc.identifier.wos567059900001
dc.keywordsNumeral spotting
dc.keywordsHistorical document analysis
dc.keywordsConvolutional neural networks
dc.keywordsDeep transfer learning
dc.keywordsHandwritten digit recognition
dc.language.isoeng
dc.publisherMdpi
dc.relation.ispartofApplied Sciences-Basel
dc.subjectChemistry
dc.subjectEngineering
dc.subjectMaterials science
dc.subjectPhysics
dc.subjectApplied physics
dc.titleAutomatic CNN-based Arabic numeral spotting and handwritten digit recognition by using deep transfer learning in Ottoman population registers
dc.typeJournal Article
dspace.entity.typePublication
local.contributor.kuauthorCan, Yekta Said
local.contributor.kuauthorKabadayı, Mustafa Erdem
local.publication.orgunit1College of Social Sciences and Humanities
local.publication.orgunit2Department of History
relation.isOrgUnitOfPublicationbe8432df-d124-44c3-85b4-be586c2db8a3
relation.isOrgUnitOfPublication.latestForDiscoverybe8432df-d124-44c3-85b4-be586c2db8a3
relation.isParentOrgUnitOfPublication3f7621e3-0d26-42c2-af64-58a329522794
relation.isParentOrgUnitOfPublication.latestForDiscovery3f7621e3-0d26-42c2-af64-58a329522794

Files