Publication:
Modeling morphologically rich languages using splitwords and unstructured dependencies

Placeholder

School / College / Institute

Organizational Unit

Program

KU Authors

Co-Authors

Publication Date

Language

Embargo Status

Journal Title

Journal ISSN

Volume Title

Alternative Title

Abstract

We experiment with splitting words into their stem and suffix components for modeling morphologically rich languages. We show that using a morphological analyzer and disambiguator results in a significant perplexity reduction in Turkish. We present flexible n-gram models, Flex-Grams, which assume that the n-1 tokens that determine the probability of a given token can be chosen anywhere in the sentence rather than the preceding n-1 positions. Our final model achieves 27% perplexity reduction compared to the standard n-gram model.

Source

Publisher

Association for Computational Linguistics (ACL)

Subject

Computer engineering

Citation

Has Part

Source

ACL-IJCNLP 2009 - Joint Conf. of the 47th Annual Meeting of the Association for Computational Linguistics and 4th Int. Joint Conf. on Natural Language Processing of the AFNLP, Proceedings of the Conf.

Book Series Title

Edition

DOI

10.3115/1667583.1667690

item.page.datauri

Link

Rights

Copyrights Note

Endorsement

Review

Supplemented By

Referenced By

0

Views

0

Downloads

View PlumX Details