Modeling morphologically rich languages using splitwords and unstructured dependencies

2024-11-0920099781-6173-8258-110.3115/1667583.16676902-s2.0-84859062288https://doi.org/10.3115/1667583.1667690https://hdl.handle.net/20.500.14288/14609We experiment with splitting words into their stem and suffix components for modeling morphologically rich languages. We show that using a morphological analyzer and disambiguator results in a significant perplexity reduction in Turkish. We present flexible n-gram models, Flex-Grams, which assume that the n-1 tokens that determine the probability of a given token can be chosen anywhere in the sentence rather than the preceding n-1 positions. Our final model achieves 27% perplexity reduction compared to the standard n-gram model.engN/AComputer engineeringModeling morphologically rich languages using splitwords and unstructured dependenciesConference ProceedingBakılacakN/A9834