Publication:
Curation of historical Arabic handwritten digit datasets from Ottoman population registers: a deep transfer learning case study

dc.contributor.kuauthorCan, Yekta Said
dc.contributor.kuauthorKabadayı, Mustafa Erdem
dc.contributor.kuprofileResearcher
dc.contributor.kuprofileFaculty Member
dc.contributor.schoolcollegeinstituteCollege of Social Sciences and Humanities
dc.contributor.yokidN/A
dc.contributor.yokid33267
dc.date.accessioned2024-11-09T23:12:58Z
dc.date.issued2020
dc.description.abstractWith the increasing number of digitization efforts of historical manuscripts and archives, automatical information retrieval systems need to extract meaning fast and reliably. Historical archives bring more challenges for these systems when compared to modern manuscripts. More advanced algorithms, archive specific methods, preprocessing techniques are needed to retrieve information. Cutting-edge machine learning algorithms should also be applied to retrieve meaning from these documents. One of the most important research issues of historical document analysis is the lack of public datasets. Although there are plenty of public datasets for modern document analysis, the number of public annotated historical archives is limited. Researchers can test novel algorithms on these modern datasets and infer some results, but their performance is unknown without testing them on historical datasets. In this study, we created a historical Arabic handwritten digit dataset by combining manual annotation and automatic document analysis techniques. The dataset is open for researchers and contained more than 6000 digits. We then tested deep transfer learning algorithms and various machine learning techniques to recognize these digits and achieved promising results.
dc.description.indexedbyWoS
dc.description.indexedbyScopus
dc.description.openaccessNO
dc.description.sponsorshipEuropean Research Council (ERC) under the European Union [679097] This work has been supported by European Research Council (ERC) Project: "Industrialisation and Urban Growth from the mid-nineteenth century Ottoman Empire to Contemporary Turkey in a Comparative Perspective, 1850-2000" under the European Union's Horizon 2020 research and innovation programme grant agreement No. 679097.
dc.identifier.doi10.1109/BigData50022.2020.9378445
dc.identifier.isbn978-1-7281-6251-5
dc.identifier.issn2639-1589
dc.identifier.scopus2-s2.0-85103859375
dc.identifier.urihttp://dx.doi.org/10.1109/BigData50022.2020.9378445
dc.identifier.urihttps://hdl.handle.net/20.500.14288/9901
dc.identifier.wos662554701117
dc.keywordsNumeral spotting
dc.keywordsHistorical document analysis
dc.keywordsConvolutional neural networks
dc.keywordsDeep transfer learning
dc.keywordsHandwritten digit recognition
dc.keywordsDataset curation
dc.keywordsPage segmentation
dc.keywordsCNN
dc.languageEnglish
dc.publisherIEEE
dc.source2020 IEEE International Conference On Big Data (Big Data)
dc.subjectComputer science
dc.subjectArtificial intelligence
dc.subjectComputer science
dc.subjectInformation systems
dc.subjectComputer science
dc.subjectTheory methods
dc.titleCuration of historical Arabic handwritten digit datasets from Ottoman population registers: a deep transfer learning case study
dc.typeConference proceeding
dspace.entity.typePublication
local.contributor.authoridN/A
local.contributor.authorid0000-0003-3206-0190
local.contributor.kuauthorCan, Yekta Said
local.contributor.kuauthorKabadayı, Mustafa Erdem

Files