Unsupervised instance-based part of speech induction using probable substitutes

Publication:
Unsupervised instance-based part of speech induction using probable substitutes

Departments

Organizational Unit

KUIS AI (Koç University & İş Bank Artificial Intelligence Center)

School / College / Institute

Organizational Unit

Research Center

KU-Authors

Yatbaz, Mehmet Ali

Yüret, Deniz

Sert, Enis Rıfat

Date

2014

Type

Conference Proceeding

Embargo Status

N/A

Abstract

We develop an instance (token) based extension of the state of the art word (type) based part-ofspeech induction system introduced in (Yatbaz et al., 2012). Each word instance is represented by a feature vector that combines information from the target word and probable substitutes sampled from an n-gram model representing its context. Modeling ambiguity using an instance based model does not lead to significant gains in overall accuracy in part-of-speech tagging because most words in running text are used in their most frequent class (e.g. 93.69% in the Penn Treebank). However it is important to model ambiguity because most frequent words are ambiguous and not modeling them correctly may negatively affect upstream tasks. Our main contribution is to show that an instance based model can achieve significantly higher accuracy on ambiguous words at the cost of a slight degradation on unambiguous ones, maintaining a comparable overall accuracy. On the Penn Treebank, the overall many-to-one accuracy of the system is within 1% of the state-of-the-art (80%), while on highly ambiguous words it is up to 70% better. On multilingual experiments our results are significantly better than or comparable to the best published word or instance based systems on 15 out of 19 corpora in 15 languages. The vector representations for words used in our system are available for download for further experiments.

Publisher

Association for Computational Linguistics (ACL)

Subject

Computer Science, Artificial Intelligence

Source

COLING 2014 - 25th International Conference on Computational Linguistics, Proceedings of COLING 2014: Technical Papers

URI

https://hdl.handle.net/20.500.14288/10949

Link

https://aclanthology.org/C14-1217.pdf

Rights

N/A

Collections

Publications without Fulltext

Full item page

Publication: Unsupervised instance-based part of speech induction using probable substitutes

Departments

School / College / Institute

Program

KU-Authors

KU Authors

Co-Authors

Editor & Affiliation

Compiler & Affiliation

Translator

Other Contributor

Date

Language

Type

Embargo Status

Journal Title

Journal ISSN

Volume Title

Alternative Title

Abstract

Source

Publisher

Subject

Citation

Has Part

Source

Book Series Title

Edition

DOI

URI

item.page.datauri

Link

Rights

Copyrights Note

Collections

Endorsement

Review

Supplemented By

Referenced By

Related Goal

1

Views

0

Downloads

Publication:
Unsupervised instance-based part of speech induction using probable substitutes