Publication:
Instance selection for machine translation using feature decay algorithms

dc.contributor.departmentDepartment of Computer Engineering
dc.contributor.departmentGraduate School of Sciences and Engineering
dc.contributor.kuauthorBiçici, Ergün
dc.contributor.kuauthorYüret, Deniz
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.contributor.schoolcollegeinstituteGRADUATE SCHOOL OF SCIENCES AND ENGINEERING
dc.date.accessioned2024-11-10T00:06:43Z
dc.date.issued2011
dc.description.abstractWe present an empirical study of instance selection techniques for machine translation. In an active learning setting, instance selection minimizes the human effort by identifying the most informative sentences for translation. In a transductive learning setting, selection of training instances relevant to the test set improves the final translation quality. After reviewing the state of the art in the field, we generalize the main ideas in a class of instance selection algorithms that use feature decay. Feature decay algorithms increase diversity of the training set by devaluing features that are already included. We show that the feature decay rate has a very strong effect on the final translation quality whereas the initial feature values, inclusion of higher order features, or sentence length normalizations do not. We evaluate the best instance selection methods using a standard Moses baseline using the whole 1.6 million sentence English-German section of the Europarl corpus. We show that selecting the best 3000 training sentences for a specific test sentence is sufficient to obtain a score within 1 BLEU of the baseline, using 5% of the training data is sufficient to exceed the baseline, and a ∼ 2 BLEU improvement over the baseline is possible by optimally selected subset of the training data. In out-of-domain translation, we are able to reduce the training set size to about 7% and achieve a similar performance with the baseline.
dc.description.indexedbyScopus
dc.description.openaccessYES
dc.description.publisherscopeInternational
dc.description.sponsoredbyTubitakEuN/A
dc.identifier.isbn9781-9372-8412-1
dc.identifier.linkhttps://www.scopus.com/inward/record.uri?eid=2-s2.0-85092272393andpartnerID=40andmd5=165812075c1d082827d55f33369111d5
dc.identifier.quartileN/A
dc.identifier.scopus2-s2.0-85092272393
dc.identifier.urihttps://hdl.handle.net/20.500.14288/16661
dc.keywordsComputational linguistics
dc.keywordsComputer aided language translation
dc.keywordsDecay (organic)
dc.keywordsMachine translation
dc.keywordsNatural language processing systems
dc.keywordsActive Learning
dc.keywordsEmpirical studies
dc.keywordsInstance selection
dc.keywordsLearning settings
dc.keywordsMachine translations
dc.keywordsSelection techniques
dc.keywordsTraining data
dc.keywordsTraining sets
dc.keywordsTransductive learning
dc.keywordsTranslation quality
dc.keywordsFeature extraction
dc.language.isoeng
dc.publisherAssociation for Computational Linguistics
dc.relation.ispartofWMT 2011 - 6thWorkshop on Statistical Machine Translation, Proceedings of the Workshop
dc.subjectComputer engineering
dc.titleInstance selection for machine translation using feature decay algorithms
dc.typeConference Proceeding
dspace.entity.typePublication
local.contributor.kuauthorYüret, Deniz
local.contributor.kuauthorBiçici, Ergün
local.publication.orgunit1College of Engineering
local.publication.orgunit1GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
local.publication.orgunit2Department of Computer Engineering
local.publication.orgunit2Graduate School of Sciences and Engineering
relation.isOrgUnitOfPublication89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication3fc31c89-e803-4eb1-af6b-6258bc42c3d8
relation.isOrgUnitOfPublication.latestForDiscovery89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isParentOrgUnitOfPublication8e756b23-2d4a-4ce8-b1b3-62c794a8c164
relation.isParentOrgUnitOfPublication434c9663-2b11-4e66-9399-c863e2ebae43
relation.isParentOrgUnitOfPublication.latestForDiscovery8e756b23-2d4a-4ce8-b1b3-62c794a8c164

Files