Instance selection for machine translation using feature decay algorithms

Publication:
Instance selection for machine translation using feature decay algorithms

dc.contributor.department	Department of Computer Engineering
dc.contributor.department	Graduate School of Sciences and Engineering
dc.contributor.facultymember	Yes
dc.contributor.kuauthor	Biçici, Ergün
dc.contributor.kuauthor	Yüret, Deniz
dc.contributor.schoolcollegeinstitute	College of Engineering
dc.contributor.schoolcollegeinstitute	GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
dc.date.accessioned	2024-11-10T00:06:43Z
dc.date.issued	2011
dc.description.abstract	We present an empirical study of instance selection techniques for machine translation. In an active learning setting, instance selection minimizes the human effort by identifying the most informative sentences for translation. In a transductive learning setting, selection of training instances relevant to the test set improves the final translation quality. After reviewing the state of the art in the field, we generalize the main ideas in a class of instance selection algorithms that use feature decay. Feature decay algorithms increase diversity of the training set by devaluing features that are already included. We show that the feature decay rate has a very strong effect on the final translation quality whereas the initial feature values, inclusion of higher order features, or sentence length normalizations do not. We evaluate the best instance selection methods using a standard Moses baseline using the whole 1.6 million sentence English-German section of the Europarl corpus. We show that selecting the best 3000 training sentences for a specific test sentence is sufficient to obtain a score within 1 BLEU of the baseline, using 5% of the training data is sufficient to exceed the baseline, and a ∼ 2 BLEU improvement over the baseline is possible by optimally selected subset of the training data. In out-of-domain translation, we are able to reduce the training set size to about 7% and achieve a similar performance with the baseline.
dc.description.fulltext	No
dc.description.harvestedfrom	Manual
dc.description.indexedby	Scopus
dc.description.openaccess	YES
dc.description.peerreviewstatus	N/A
dc.description.publisherscope	International
dc.description.readpublish	N/A
dc.description.sponsoredbyTubitakEu	N/A
dc.description.studentonlypublication	No
dc.description.studentpublication	Yes
dc.description.version	N/A
dc.identifier.embargo	N/A
dc.identifier.isbn	9781-9372-8412-1
dc.identifier.link	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85092272393andpartnerID=40andmd5=165812075c1d082827d55f33369111d5
dc.identifier.quartile	Bakılacak
dc.identifier.scopus	2-s2.0-85092272393
dc.identifier.uri	https://hdl.handle.net/20.500.14288/16661
dc.keywords	Computational linguistics
dc.keywords	Computer aided language translation
dc.keywords	Decay (organic)
dc.keywords	Machine translation
dc.keywords	Natural language processing systems
dc.keywords	Active Learning
dc.keywords	Empirical studies
dc.keywords	Instance selection
dc.keywords	Learning settings
dc.keywords	Machine translations
dc.keywords	Selection techniques
dc.keywords	Training data
dc.keywords	Training sets
dc.keywords	Transductive learning
dc.keywords	Translation quality
dc.keywords	Feature extraction
dc.language.iso	eng
dc.publisher	Association for Computational Linguistics
dc.relation.affiliation	Koç University
dc.relation.collection	Koç University Institutional Repository
dc.relation.ispartof	WMT 2011 - 6thWorkshop on Statistical Machine Translation, Proceedings of the Workshop
dc.relation.openaccess	N/A
dc.rights	N/A
dc.subject	Computer engineering
dc.title	Instance selection for machine translation using feature decay algorithms
dc.type	Conference Proceeding
dspace.entity.type	Publication
local.contributor.kuauthor	Yüret, Deniz
local.contributor.kuauthor	Biçici, Ergün
relation.isOrgUnitOfPublication	89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication	3fc31c89-e803-4eb1-af6b-6258bc42c3d8
relation.isOrgUnitOfPublication.latestForDiscovery	89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isParentOrgUnitOfPublication	8e756b23-2d4a-4ce8-b1b3-62c794a8c164
relation.isParentOrgUnitOfPublication	434c9663-2b11-4e66-9399-c863e2ebae43
relation.isParentOrgUnitOfPublication.latestForDiscovery	8e756b23-2d4a-4ce8-b1b3-62c794a8c164

Collections

Publications without Fulltext

Publication: Instance selection for machine translation using feature decay algorithms

Files

Collections

Publication:
Instance selection for machine translation using feature decay algorithms