Curation of historical Arabic handwritten digit datasets from Ottoman population registers: a deep transfer learning case study

Publication:
Curation of historical Arabic handwritten digit datasets from Ottoman population registers: a deep transfer learning case study

dc.contributor.kuauthor	Can, Yekta Said
dc.contributor.kuauthor	Kabadayı, Mustafa Erdem
dc.contributor.schoolcollegeinstitute	College of Social Sciences and Humanities
dc.date.accessioned	2024-11-09T23:12:58Z
dc.date.issued	2020
dc.description.abstract	With the increasing number of digitization efforts of historical manuscripts and archives, automatical information retrieval systems need to extract meaning fast and reliably. Historical archives bring more challenges for these systems when compared to modern manuscripts. More advanced algorithms, archive specific methods, preprocessing techniques are needed to retrieve information. Cutting-edge machine learning algorithms should also be applied to retrieve meaning from these documents. One of the most important research issues of historical document analysis is the lack of public datasets. Although there are plenty of public datasets for modern document analysis, the number of public annotated historical archives is limited. Researchers can test novel algorithms on these modern datasets and infer some results, but their performance is unknown without testing them on historical datasets. In this study, we created a historical Arabic handwritten digit dataset by combining manual annotation and automatic document analysis techniques. The dataset is open for researchers and contained more than 6000 digits. We then tested deep transfer learning algorithms and various machine learning techniques to recognize these digits and achieved promising results.
dc.description.indexedby	WOS
dc.description.indexedby	Scopus
dc.description.openaccess	NO
dc.description.sponsoredbyTubitakEu	N/A
dc.description.sponsorship	European Research Council (ERC) under the European Union [679097] This work has been supported by European Research Council (ERC) Project: "Industrialisation and Urban Growth from the mid-nineteenth century Ottoman Empire to Contemporary Turkey in a Comparative Perspective, 1850-2000" under the European Union's Horizon 2020 research and innovation programme grant agreement No. 679097.
dc.identifier.doi	10.1109/BigData50022.2020.9378445
dc.identifier.isbn	978-1-7281-6251-5
dc.identifier.issn	2639-1589
dc.identifier.scopus	2-s2.0-85103859375
dc.identifier.uri	https://doi.org/10.1109/BigData50022.2020.9378445
dc.identifier.uri	https://hdl.handle.net/20.500.14288/9901
dc.identifier.wos	662554701117
dc.keywords	Numeral spotting
dc.keywords	Historical document analysis
dc.keywords	Convolutional neural networks
dc.keywords	Deep transfer learning
dc.keywords	Handwritten digit recognition
dc.keywords	Dataset curation
dc.keywords	Page segmentation
dc.keywords	CNN
dc.language.iso	eng
dc.publisher	IEEE
dc.relation.ispartof	2020 IEEE International Conference On Big Data (Big Data)
dc.subject	Computer science
dc.subject	Artificial intelligence
dc.subject	Computer science
dc.subject	Information systems
dc.subject	Computer science
dc.subject	Theory methods
dc.title	Curation of historical Arabic handwritten digit datasets from Ottoman population registers: a deep transfer learning case study
dc.type	Conference Proceeding
dspace.entity.type	Publication
local.contributor.kuauthor	Can, Yekta Said
local.contributor.kuauthor	Kabadayı, Mustafa Erdem
local.publication.orgunit1	College of Social Sciences and Humanities
person.familyName	Can
person.familyName	Kabadayı
person.givenName	Yekta Said
person.givenName	Mustafa Erdem
relation.isParentOrgUnitOfPublication	3f7621e3-0d26-42c2-af64-58a329522794
relation.isParentOrgUnitOfPublication.latestForDiscovery	3f7621e3-0d26-42c2-af64-58a329522794

Collections

Publications without Fulltext

Publication: Curation of historical Arabic handwritten digit datasets from Ottoman population registers: a deep transfer learning case study

Files

Collections

Publication:
Curation of historical Arabic handwritten digit datasets from Ottoman population registers: a deep transfer learning case study