Publication: Modeling morphologically rich languages using splitwords and unstructured dependencies
Program
KU-Authors
KU Authors
Co-Authors
Advisor
Publication Date
2009
Language
English
Type
Conference proceeding
Journal Title
Journal ISSN
Volume Title
Abstract
We experiment with splitting words into their stem and suffix components for modeling morphologically rich languages. We show that using a morphological analyzer and disambiguator results in a significant perplexity reduction in Turkish. We present flexible n-gram models, Flex-Grams, which assume that the n-1 tokens that determine the probability of a given token can be chosen anywhere in the sentence rather than the preceding n-1 positions. Our final model achieves 27% perplexity reduction compared to the standard n-gram model.
Description
Source:
ACL-IJCNLP 2009 - Joint Conf. of the 47th Annual Meeting of the Association for Computational Linguistics and 4th Int. Joint Conf. on Natural Language Processing of the AFNLP, Proceedings of the Conf.
Publisher:
Association for Computational Linguistics (ACL)
Keywords:
Subject
Computer engineering