Publication: Unsupervised instance-based part of speech induction using probable substitutes
dc.contributor.department | Department of Computer Engineering | |
dc.contributor.department | Graduate School of Sciences and Engineering | |
dc.contributor.department | KUIS AI (Koç University & İş Bank Artificial Intelligence Center) | |
dc.contributor.kuauthor | Yatbaz, Mehmet Ali | |
dc.contributor.kuauthor | Yüret, Deniz | |
dc.contributor.schoolcollegeinstitute | College of Engineering | |
dc.contributor.schoolcollegeinstitute | GRADUATE SCHOOL OF SCIENCES AND ENGINEERING | |
dc.contributor.schoolcollegeinstitute | Research Center | |
dc.date.accessioned | 2024-11-09T23:21:45Z | |
dc.date.issued | 2014 | |
dc.description.abstract | We develop an instance (token) based extension of the state of the art word (type) based part-ofspeech induction system introduced in (Yatbaz et al., 2012). Each word instance is represented by a feature vector that combines information from the target word and probable substitutes sampled from an n-gram model representing its context. Modeling ambiguity using an instance based model does not lead to significant gains in overall accuracy in part-of-speech tagging because most words in running text are used in their most frequent class (e.g. 93.69% in the Penn Treebank). However it is important to model ambiguity because most frequent words are ambiguous and not modeling them correctly may negatively affect upstream tasks. Our main contribution is to show that an instance based model can achieve significantly higher accuracy on ambiguous words at the cost of a slight degradation on unambiguous ones, maintaining a comparable overall accuracy. On the Penn Treebank, the overall many-to-one accuracy of the system is within 1% of the state-of-the-art (80%), while on highly ambiguous words it is up to 70% better. On multilingual experiments our results are significantly better than or comparable to the best published word or instance based systems on 15 out of 19 corpora in 15 languages. The vector representations for words used in our system are available for download for further experiments. | |
dc.description.indexedby | Scopus | |
dc.description.openaccess | YES | |
dc.description.publisherscope | International | |
dc.description.sponsoredbyTubitakEu | N/A | |
dc.description.sponsorship | Baidu | |
dc.description.sponsorship | eBay | |
dc.description.sponsorship | ||
dc.description.sponsorship | Microsoft | |
dc.description.sponsorship | Symantec | |
dc.identifier.isbn | 9781-9416-4326-6 | |
dc.identifier.link | https://www.scopus.com/inward/record.uri?eid=2-s2.0-84959884159&partnerID=40&md5=56b6c16081b66947f8c5bdfe9dc3d7ca | |
dc.identifier.scopus | 2-s2.0-84959884159 | |
dc.identifier.uri | https://hdl.handle.net/20.500.14288/10949 | |
dc.keywords | Forestry | |
dc.keywords | Linguistics | |
dc.keywords | Feature vectors | |
dc.keywords | Induction system | |
dc.keywords | N-gram modeling | |
dc.keywords | Overall accuracies | |
dc.keywords | Part of speech tagging | |
dc.keywords | Part-of-speech inductions | |
dc.keywords | State of the art | |
dc.keywords | Vector representations | |
dc.keywords | Computational linguistics | |
dc.language.iso | eng | |
dc.publisher | Association for Computational Linguistics (ACL) | |
dc.relation.ispartof | COLING 2014 - 25th International Conference on Computational Linguistics, Proceedings of COLING 2014: Technical Papers | |
dc.subject | Computer Science | |
dc.subject | Artificial intelligence | |
dc.title | Unsupervised instance-based part of speech induction using probable substitutes | |
dc.type | Conference Proceeding | |
dspace.entity.type | Publication | |
local.contributor.kuauthor | Yatbaz, Mehmet Ali | |
local.contributor.kuauthor | Sert, Enis Rıfat | |
local.contributor.kuauthor | Yüret, Deniz | |
local.publication.orgunit1 | GRADUATE SCHOOL OF SCIENCES AND ENGINEERING | |
local.publication.orgunit1 | College of Engineering | |
local.publication.orgunit1 | Research Center | |
local.publication.orgunit2 | Department of Computer Engineering | |
local.publication.orgunit2 | KUIS AI (Koç University & İş Bank Artificial Intelligence Center) | |
local.publication.orgunit2 | Graduate School of Sciences and Engineering | |
relation.isOrgUnitOfPublication | 89352e43-bf09-4ef4-82f6-6f9d0174ebae | |
relation.isOrgUnitOfPublication | 3fc31c89-e803-4eb1-af6b-6258bc42c3d8 | |
relation.isOrgUnitOfPublication | 77d67233-829b-4c3a-a28f-bd97ab5c12c7 | |
relation.isOrgUnitOfPublication.latestForDiscovery | 89352e43-bf09-4ef4-82f6-6f9d0174ebae | |
relation.isParentOrgUnitOfPublication | 8e756b23-2d4a-4ce8-b1b3-62c794a8c164 | |
relation.isParentOrgUnitOfPublication | 434c9663-2b11-4e66-9399-c863e2ebae43 | |
relation.isParentOrgUnitOfPublication | d437580f-9309-4ecb-864a-4af58309d287 | |
relation.isParentOrgUnitOfPublication.latestForDiscovery | 8e756b23-2d4a-4ce8-b1b3-62c794a8c164 |