Publication:
Context-based sentence alignment in parallel corpora

dc.contributor.departmentGraduate School of Sciences and Engineering
dc.contributor.schoolcollegeinstituteGRADUATE SCHOOL OF SCIENCES AND ENGINEERING
dc.date.accessioned2024-11-09T23:26:06Z
dc.date.issued2008
dc.description.abstractThis paper presents a language-independent context-based sentence alignment technique given parallel corpora. We can view the problem of aligning sentences as finding translations of sentences chosen from different sources. Unlike current approaches which rely on pre-defined features and models, our algorithm employs features derived from the distributional properties of words and does not use any language dependent knowledge. We make use of the context of sentences and the notion of Zipfian word vectors which effectively models the distributional properties of words in a given sentence. We accept the context to be the frame in which the reasoning about sentence alignment is done. We evaluate the performance of our system based on two different measures: sentence alignment accuracy and sentence alignment coverage. We compare the performance of our system with commonly used sentence alignment systems and show that our system performs 1.2149 to 1.6022 times better in reducing the error rate in alignment accuracy and coverage for moderately sized corpora.
dc.description.indexedbyWOS
dc.description.indexedbyScopus
dc.description.openaccessNO
dc.description.publisherscopeInternational
dc.description.sponsoredbyTubitakEuN/A
dc.description.volume4919
dc.identifier.isbn978-3-540-78134-9
dc.identifier.issn0302-9743
dc.identifier.scopus2-s2.0-49949083112
dc.identifier.urihttps://hdl.handle.net/20.500.14288/11490
dc.identifier.wos253658200037
dc.keywordsSentence alignment
dc.keywordsContext
dc.keywordsZipfian word vectors
dc.keywordsMultilingual
dc.language.isoeng
dc.publisherSpringer-Verlag Berlin
dc.relation.ispartofComputational Linguistics and Intelligent Text Processing
dc.subjectComputer science
dc.subjectArtificial intelligence
dc.subjectComputer science
dc.subjectTheory and methods
dc.titleContext-based sentence alignment in parallel corpora
dc.typeConference Proceeding
dspace.entity.typePublication
local.contributor.kuauthorBiçici, Ergun
local.publication.orgunit1GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
local.publication.orgunit2Graduate School of Sciences and Engineering
relation.isOrgUnitOfPublication3fc31c89-e803-4eb1-af6b-6258bc42c3d8
relation.isOrgUnitOfPublication.latestForDiscovery3fc31c89-e803-4eb1-af6b-6258bc42c3d8
relation.isParentOrgUnitOfPublication434c9663-2b11-4e66-9399-c863e2ebae43
relation.isParentOrgUnitOfPublication.latestForDiscovery434c9663-2b11-4e66-9399-c863e2ebae43

Files