Publication:
Unsupervised learning of Turkish morphology with multiple codebook VQ-VAE

dc.contributor.departmentKUIS AI (Koç University & İş Bank Artificial Intelligence Center)
dc.contributor.kuauthorYüret, Deniz
dc.contributor.kuauthorKural, Müge
dc.contributor.schoolcollegeinstituteResearch Center
dc.date.accessioned2025-03-06T21:00:31Z
dc.date.issued2024
dc.description.abstractThis paper presents an interpretable unsupervised morphological learning model, showing comparable performance to supervised models in learning complex morphological rules of Turkish as evidenced by its application to the problem of morphological inflection within the SIGMORPHON Shared Tasks. The significance of our unsupervised approach lies in its alignment with how humans naturally acquire rules from raw data without supervision. To achieve this, we construct a model with multiple codebooks of VQ-VAE employing continuous and discrete latent variables during word generation. We evaluate the model’s performance under high and low-resource scenarios, and use probing techniques to examine encoded information in latent representations. We also evaluate its generalization capabilities by testing unseen suffixation scenarios within the SIGMORPHON-UniMorph 2022 Shared Task 0. Our results demonstrate our model’s ability to distinguish word structures into lemmas and suffixes, with each codebook specialized for different morphological features, contributing to the interpretability of our model and effectively performing morphological inflection on both seen and unseen morphological features.
dc.description.indexedbyScopus
dc.description.publisherscopeInternational
dc.description.sponsoredbyTubitakEuN/A
dc.description.sponsorshipWe gratefully acknowledge the support of the KUIS AI Center at Koc University, Istanbul, for this work.
dc.identifier.isbn9798891761407
dc.identifier.quartileN/A
dc.identifier.scopus2-s2.0-85204733595
dc.identifier.urihttps://hdl.handle.net/20.500.14288/27907
dc.keywordsUnsupervised learning
dc.keywordsTurkish morphology
dc.keywordsMultiple codebook
dc.keywordsVQ-VAE
dc.keywordsVector quantization
dc.keywordsNatural language processing
dc.keywordsMorphological analysis
dc.keywordsDeep learning
dc.keywordsNeural networks
dc.keywordsComputational linguistics
dc.language.isoeng
dc.publisherAssociation for Computational Linguistics (ACL)
dc.relation.ispartofSIGTURK 2024 - 1st Workshop on Natural Language Processing for Turkic Languages, Proceedings of the Workshop
dc.subjectEngineering
dc.titleUnsupervised learning of Turkish morphology with multiple codebook VQ-VAE
dc.typeConference Proceeding
dspace.entity.typePublication
local.publication.orgunit1Research Center
local.publication.orgunit2KUIS AI (Koç University & İş Bank Artificial Intelligence Center)
relation.isOrgUnitOfPublication77d67233-829b-4c3a-a28f-bd97ab5c12c7
relation.isOrgUnitOfPublication.latestForDiscovery77d67233-829b-4c3a-a28f-bd97ab5c12c7
relation.isParentOrgUnitOfPublicationd437580f-9309-4ecb-864a-4af58309d287
relation.isParentOrgUnitOfPublication.latestForDiscoveryd437580f-9309-4ecb-864a-4af58309d287

Files