Curation of historical Arabic handwritten digit datasets from Ottoman population registers: a deep transfer learning case study

Publication:
Curation of historical Arabic handwritten digit datasets from Ottoman population registers: a deep transfer learning case study

School / College / Institute

Organizational Unit

College of Social Sciences and Humanities

KU-Authors

Can, Yekta Said

Kabadayı, Mustafa Erdem

Publication Date

2020

Type

Conference Proceeding

Abstract

With the increasing number of digitization efforts of historical manuscripts and archives, automatical information retrieval systems need to extract meaning fast and reliably. Historical archives bring more challenges for these systems when compared to modern manuscripts. More advanced algorithms, archive specific methods, preprocessing techniques are needed to retrieve information. Cutting-edge machine learning algorithms should also be applied to retrieve meaning from these documents. One of the most important research issues of historical document analysis is the lack of public datasets. Although there are plenty of public datasets for modern document analysis, the number of public annotated historical archives is limited. Researchers can test novel algorithms on these modern datasets and infer some results, but their performance is unknown without testing them on historical datasets. In this study, we created a historical Arabic handwritten digit dataset by combining manual annotation and automatic document analysis techniques. The dataset is open for researchers and contained more than 6000 digits. We then tested deep transfer learning algorithms and various machine learning techniques to recognize these digits and achieved promising results.

Publisher

IEEE

Subject

Computer science, Artificial intelligence, Computer science, Information systems, Computer science, Theory methods

Source

2020 IEEE International Conference On Big Data (Big Data)

DOI

10.1109/BigData50022.2020.9378445

URI

https://doi.org/10.1109/BigData50022.2020.9378445
https://hdl.handle.net/20.500.14288/9901

Publication:
Curation of historical Arabic handwritten digit datasets from Ottoman population registers: a deep transfer learning case study

Departments

School / College / Institute

Program

KU-Authors

KU Authors

Co-Authors

Publication Date

Language

Type

Embargo Status

Journal Title

Journal ISSN

Volume Title

Alternative Title

Abstract

Source

Publisher

Subject

Citation

Has Part

Source

Book Series Title

Edition

DOI

URI

item.page.datauri

Link

Rights

Copyrights Note

Collections

Endorsement

Review

Supplemented By

Referenced By

6

Views

0

Downloads

Publication: Curation of historical Arabic handwritten digit datasets from Ottoman population registers: a deep transfer learning case study

Departments

School / College / Institute

Program

KU-Authors

KU Authors

Co-Authors

Publication Date

Language

Type

Embargo Status

Journal Title

Journal ISSN

Volume Title

Alternative Title

Abstract

Source

Publisher

Subject

Citation

Has Part

Source

Book Series Title

Edition

DOI

URI

item.page.datauri

Link

Rights

Copyrights Note

Collections

Endorsement

Review

Supplemented By

Referenced By

6

Views

0

Downloads

Publication:
Curation of historical Arabic handwritten digit datasets from Ottoman population registers: a deep transfer learning case study