Publication: Instance selection for machine translation using feature decay algorithms
dc.contributor.department | Department of Computer Engineering | |
dc.contributor.department | Graduate School of Sciences and Engineering | |
dc.contributor.kuauthor | Biçici, Ergün | |
dc.contributor.kuauthor | Yüret, Deniz | |
dc.contributor.schoolcollegeinstitute | College of Engineering | |
dc.contributor.schoolcollegeinstitute | GRADUATE SCHOOL OF SCIENCES AND ENGINEERING | |
dc.date.accessioned | 2024-11-10T00:06:43Z | |
dc.date.issued | 2011 | |
dc.description.abstract | We present an empirical study of instance selection techniques for machine translation. In an active learning setting, instance selection minimizes the human effort by identifying the most informative sentences for translation. In a transductive learning setting, selection of training instances relevant to the test set improves the final translation quality. After reviewing the state of the art in the field, we generalize the main ideas in a class of instance selection algorithms that use feature decay. Feature decay algorithms increase diversity of the training set by devaluing features that are already included. We show that the feature decay rate has a very strong effect on the final translation quality whereas the initial feature values, inclusion of higher order features, or sentence length normalizations do not. We evaluate the best instance selection methods using a standard Moses baseline using the whole 1.6 million sentence English-German section of the Europarl corpus. We show that selecting the best 3000 training sentences for a specific test sentence is sufficient to obtain a score within 1 BLEU of the baseline, using 5% of the training data is sufficient to exceed the baseline, and a ∼ 2 BLEU improvement over the baseline is possible by optimally selected subset of the training data. In out-of-domain translation, we are able to reduce the training set size to about 7% and achieve a similar performance with the baseline. | |
dc.description.indexedby | Scopus | |
dc.description.openaccess | YES | |
dc.description.publisherscope | International | |
dc.description.sponsoredbyTubitakEu | N/A | |
dc.identifier.isbn | 9781-9372-8412-1 | |
dc.identifier.link | https://www.scopus.com/inward/record.uri?eid=2-s2.0-85092272393andpartnerID=40andmd5=165812075c1d082827d55f33369111d5 | |
dc.identifier.quartile | N/A | |
dc.identifier.scopus | 2-s2.0-85092272393 | |
dc.identifier.uri | https://hdl.handle.net/20.500.14288/16661 | |
dc.keywords | Computational linguistics | |
dc.keywords | Computer aided language translation | |
dc.keywords | Decay (organic) | |
dc.keywords | Machine translation | |
dc.keywords | Natural language processing systems | |
dc.keywords | Active Learning | |
dc.keywords | Empirical studies | |
dc.keywords | Instance selection | |
dc.keywords | Learning settings | |
dc.keywords | Machine translations | |
dc.keywords | Selection techniques | |
dc.keywords | Training data | |
dc.keywords | Training sets | |
dc.keywords | Transductive learning | |
dc.keywords | Translation quality | |
dc.keywords | Feature extraction | |
dc.language.iso | eng | |
dc.publisher | Association for Computational Linguistics | |
dc.relation.ispartof | WMT 2011 - 6thWorkshop on Statistical Machine Translation, Proceedings of the Workshop | |
dc.subject | Computer engineering | |
dc.title | Instance selection for machine translation using feature decay algorithms | |
dc.type | Conference Proceeding | |
dspace.entity.type | Publication | |
local.contributor.kuauthor | Yüret, Deniz | |
local.contributor.kuauthor | Biçici, Ergün | |
local.publication.orgunit1 | College of Engineering | |
local.publication.orgunit1 | GRADUATE SCHOOL OF SCIENCES AND ENGINEERING | |
local.publication.orgunit2 | Department of Computer Engineering | |
local.publication.orgunit2 | Graduate School of Sciences and Engineering | |
relation.isOrgUnitOfPublication | 89352e43-bf09-4ef4-82f6-6f9d0174ebae | |
relation.isOrgUnitOfPublication | 3fc31c89-e803-4eb1-af6b-6258bc42c3d8 | |
relation.isOrgUnitOfPublication.latestForDiscovery | 89352e43-bf09-4ef4-82f6-6f9d0174ebae | |
relation.isParentOrgUnitOfPublication | 8e756b23-2d4a-4ce8-b1b3-62c794a8c164 | |
relation.isParentOrgUnitOfPublication | 434c9663-2b11-4e66-9399-c863e2ebae43 | |
relation.isParentOrgUnitOfPublication.latestForDiscovery | 8e756b23-2d4a-4ce8-b1b3-62c794a8c164 |